These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Parallel to our physical activities our virtual presence also leaves behind
our unique digital fingerprints, while navigating on the Internet. These
digital fingerprints have the potential to unveil users' activities
encompassing browsing history, utilized applications, and even devices employed
during these engagements. Many Internet users tend to use web browsers that
provide the highest privacy protection and anonymization such as Tor. The
success of such privacy protection depends on the Tor feature to anonymize
end-user IP addresses and other metadata that constructs the website
fingerprint. In this paper, we show that using the newest machine learning
algorithms an attacker can deanonymize Tor traffic by applying such techniques.
In our experimental framework, we establish a baseline and comparative
reference point using a publicly available dataset from Universidad Del Cauca,
Colombia. We capture network packets across 11 days, while users navigate
specific web pages, recording data in .pcapng format through the Wireshark
network capture tool. Excluding extraneous packets, we employ various machine
learning algorithms in our analysis. The results show that the Gradient
Boosting Machine algorithm delivers the best outcomes in binary classification,
achieving an accuracy of 0.8363. In the realm of multi-class classification,
the Random Forest algorithm attains an accuracy of 0.6297.