Date Published: April 25, 2019
Publisher: Public Library of Science
Author(s): Randa Aljably, Yuan Tian, Mznah Al-Rodhaan, Abdullah Al-Dhelaan, He Debiao.
The massive reach of social networks (SNs) has hidden their potential concerns, primarily those related to information privacy. Users increasingly rely on social networks for more than merely interactions and self-representation. However, social networking environments are not free of risks. Users are often threatened by privacy breaches, unauthorized access to personal information, and leakage of sensitive data. In this paper, we propose a privacy-preserving model that sanitizes the collection of user information from a social network utilizing restricted local differential privacy (LDP) to save synthetic copies of collected data. This model further uses reconstructed data to classify user activity and detect abnormal network behavior. Our experimental results demonstrate that the proposed method achieves high data utility on the basis of improved privacy preservation. Moreover, LDP sanitized data are suitable for use in subsequent analyses, such as anomaly detection. Anomaly detection on the proposed method’s reconstructed data achieves a detection accuracy similar to that on the original data.
Information sharing platforms, such as online social networks (OSNs), have experienced remarkable growth and recognition in recent years. Notably, OSN platforms have direct access to the public and private data of their users . In some cases, these data are shared with other parties to carry out analytical and social research. Although the release of social network data is considered a severe breach of privacy, OSN platforms reassure their users by anonymizing their data before sharing it. Unfortunately, data mining techniques can be used to infer sensitive information from released data. Therefore, it is necessary to sanitize network data before releasing it .
Recently, the privacy of social network data has gained increasing attention and concern. Although these types of data are necessary for generating revenue and conducting social research, there is no guarantee that the implemented anonymization techniques will protect users’ private information. In this section, we cover the state of the art application of local differential privacy (LDP) in social networks and other fields. LDP in social networks has become an alternative to simple graph anonymization and data aggregation. In one study, out-link privacy was implemented to protect information about individuals that is shared by other users. LDP has even been proposed to solve the non-uniformity problem in two-dimensional multimedia data . Zhou et al.  claimed that calculating a standard deviation circle radius defines the divergence of a data grid and allows the dynamic allocation of noise. The results of their proposed model had lower relative errors than similar approaches, such as UG) algorithm. Kim et al. applied LDP to the collection of indoor positioning data and used differentiated data for estimating the density of a specified indoor area .
In this section, we describe the proposed scheme for sanitizing SN user activity logs using LDP. We then compare the results of applying anomaly detection to the original and reconstructed data. The model functions on two servers: a data collection server and a data-analyzing server. As shown in Fig 2, the data collection server represents each activity log as a data sequence. In each sequence, we determine specific salient points. After selecting these points, we use the user’s data in addition to other parameters to create random noise. This noise is then added to the data to distort it from its original value. Finally, the data collection server stores it in data repositories.
In this section, we describe the simulation process, including the dataset, parameters, and evaluation metrics. We explain the setup and discuss the results in the second sub-section.
We applied the steps explained in the proposed approach in Section 5. We first determined the salient points in each user’s data stream. The user’s activity in Fig 8 does not contain a constant period (having the same number of calls), so all points are selected. However, the user in Fig 9 makes the same number of calls on the sixth and seventh days (dxi = 0). Since the first order derivative for timestamp 7 is zero, the salient point at this timestamp is removed. The same applies to timestamp 8; however, since it represents the beginning of a decreasing period, it is retained. The colored lines parallel to the y-axis represent the timestamps, and the intersection point between each line and the curve is a salient point.
In this paper, we presented a model for privacy preservation in social networks. The model sanitizes the collected data and sensitive information of SN users using LDP and then attempts to reconstruct the original sequences and perform analyses using sets of selected salient points. We conserve the social structure of each user’s communication pattern. The error rate of the estimated data compared to the original data is acceptable for large datasets with small time-intervals. Our simulation results show that conducting anomaly detection on synthetic data results in determining the same anomalous users and activities as those in the original data. In the future, we plan to extend the proposed privacy model to include estimating noisy data with non-linear approximation.