In this paper, we analyze the topology and the content found on the
"darknet", the set of websites accessible via Tor. We created a darknet spider
and crawled the darknet starting from a bootstrap list by recursively following
links. We explored the whole connected component of more than 34,000 hidden
services, of which we found 10,000 to be online. Contrary to folklore belief,
the visible part of the darknet is surprisingly well-connected through hub
websites such as wikis and forums. We performed a comprehensive categorization
of the content using supervised machine learning. We observe that about half of
the visible dark web content is related to apparently licit activities based on
our classifier. A significant amount of content pertains to software
repositories, blogs, and activism-related websites. Among unlawful hidden
services, most pertain to fraudulent websites, services selling counterfeit
goods, and drug markets.