With the popularity of smartphones, mobile applications (apps) have
penetrated the daily life of people. Although apps provide rich
functionalities, they also access a large amount of personal information
simultaneously. As a result, privacy concerns are raised. To understand what
personal information the apps collect, many solutions are presented to detect
privacy leaks in apps. Recently, the traffic monitoring-based privacy leak
detection method has shown promising performance and strong scalability.
However, it still has some shortcomings. Firstly, it suffers from detecting the
leakage of personal information with obfuscation. Secondly, it cannot discover
the privacy leaks of undefined type. Aiming at solving the above problems, a
new personal information detection method based on traffic monitoring is
proposed in this paper. In this paper, statistical features of personal
information are designed to depict the occurrence patterns of personal
information in the traffic, including local patterns and global patterns. Then
a detector is trained based on machine learning algorithms to discover
potential personal information with similar patterns. Since the statistical
features are independent of the value and type of personal information, the
trained detector is capable of identifying various types of privacy leaks and
obfuscated privacy leaks. As far as we know, this is the first work that
detects personal information based on statistical features. Finally, the
experimental results show that the proposed method could achieve better
performance than the state-of-the-art.