These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
The Onion Router (Tor) is a controversial network whose utility is constantly
under scrutiny. On the one hand, it allows for anonymous interaction and
cooperation of users seeking untraceable navigation on the Internet. This
freedom also attracts criminals who aim to thwart law enforcement
investigations, e.g., trading illegal products or services such as drugs or
weapons. Tor allows delivering content without revealing the actual hosting
address, by means of .onion (or hidden) services. Different from regular
domains, these services can not be resolved by traditional name services, are
not indexed by regular search engines, and they frequently change. This
generates uncertainty about the extent and size of the Tor network and the type
of content offered.
In this work, we present a large-scale analysis of the Tor Network. We
leverage our crawler, dubbed Mimir, which automatically collects and visits
content linked within the pages to collect a dataset of pages from more than
25k sites. We analyze the topology of the Tor Network, including its depth and
reachability from the surface web. We define a set of heuristics to detect the
presence of replicated content (mirrors) and show that most of the analyzed
content in the Dark Web (82% approx.) is a replica of other content. Also, we
train a custom Machine Learning classifier to understand the type of content
the hidden services offer. Overall, our study provides new insights into the
Tor network, highlighting the importance of initial seeding for focus on
specific topics, and optimize the crawling process. We show that previous work
on large-scale Tor measurements does not consider the presence of mirrors,
which biases their understanding of the Dark Web topology and the distribution
of content.