Web tracking harms user privacy. As a result, the use of tracker detection
and blocking tools is a common practice among Internet users. However, no such
tool can be perfect, and thus there is a trade-off between avoiding breakage
(caused by unintentionally blocking some required functionality) and neglecting
to block some trackers. State-of-the-art tools usually rely on user reports and
developer effort to detect breakages, which can be broadly categorized into two
causes: 1) misidentifying non-trackers as trackers, and 2) blocking mixed
trackers which blend tracking with functional components.
We propose incorporating a machine learning-based breakage detector into the
tracker detection pipeline to automatically avoid misidentification of
functional resources. For both tracker detection and breakage detection, we
propose using differential features that can more clearly elucidate the
differences caused by blocking a request. We designed and implemented a
prototype of our proposed approach, Duumviri, for non-mixed trackers. We then
adopt it to automatically identify mixed trackers, drawing differential
features at partial-request granularity.
In the case of non-mixed trackers, evaluating Duumviri on 15K pages shows its
ability to replicate the labels of human-generated filter lists, EasyPrivacy,
with an accuracy of 97.44%. Through a manual analysis, we find that Duumviri
can identify previously unreported trackers and its breakage detector can
identify overly strict EasyPrivacy rules that cause breakage. In the case of
mixed trackers, Duumviri is the first automated mixed tracker detector, and
achieves a lower bound accuracy of 74.19%. Duumviri has enabled us to detect
and confirm 22 previously unreported unique trackers and 26 unique mixed
trackers.
Proceedings of the Twelfth ACM Workshop on Hot Topics in Networks
Adreveal: Improving transparency into online targeted advertising
B. Liu, A. Sheth, U. Weinsberg, J. Chandrashekar, R. Govindan
Published: 2013
Network and Distributed System Security Symposium (NDSS)
Selling off privacy at auction
C. Castelluccia, L. Olejnik, T. Minh-Dung
Published: 2014
2021 IEEE Symposium on Security and Privacy (SP)
Detecting filter list evasion with event-loop-turn granularity javascript signatures
Q. Chen, P. Snyder, B. Livshits, A. Kapravelos
Published: 2021
Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security
Jack-in-the-box: An empirical study of javascript bundling on the web and its security implications
J. Rack, C.-A. Staicu
Published: 2023
Proc. Priv. Enhancing Technol.
An automated approach for complementing ad blockers’ blacklists
D. Gugelmann, M. Happe, B. Ager, V. Lenders
Published: 2015
arxiv
被引用数 1
Towards Seamless Tracking-Free Web: Improved Detection of Trackers via One-class Learning
Muhammad Ikram, Hassan Jameel Asghar, Mohamed Ali Kaafar, Balachander Krishnamurthy, Anirban Mahanti
Published: 2016.3.21
Numerous tools have been developed to aggressively block the execution of
popular JavaScript programs (JS) in Web browsers. Such blocking also affects
functionality of webpages and impairs user experience. As a consequence, many
privacy preserving tools (PP-Tools) that have been developed to limit online
tracking, often executed via JS, may suffer from poor performance and limited
uptake. A mechanism that can isolate JS necessary for proper functioning of the
website from tracking JS would thus be useful. Through the use of a manually
labelled dataset composed of 2,612 JS, we show how current PP-Tools are
ineffective in finding the right balance between blocking tracking JS and
allowing functional JS. To the best of our knowledge, this is the first study
to assess the performance of current web PP-Tools.
To improve this balance, we examine the two classes of JS and hypothesize
that tracking JS share structural similarities that can be used to
differentiate them from functional JS. The rationale of our approach is that
web developers often borrow and customize existing pieces of code in order to
embed tracking (resp. functional) JS into their webpages. We then propose
one-class machine learning classifiers using syntactic and semantic features
extracted from JS. When trained only on samples of tracking JS, our classifiers
achieve an accuracy of 99%, where the best of the PP-Tools achieved an accuracy
of 78%.
We further test our classifiers and several popular PP-Tools on a corpus of
4K websites with 135K JS. The output of our best classifier on this data is
between 20 to 64% different from the PP-Tools. We manually analyse a sample of
the JS for which our classifier is in disagreement with all other PP-Tools, and
show that our approach is not only able to enhance user web experience by
correctly classifying more functional JS, but also discovers previously unknown
tracking services.