Paper Information
- Author
- Abdulrahman Alabduljabbar;Runyu Ma;Ahmed Abusnaina;Rhongho Jang;Songqing Chen;DaeHun Nyang;and David Mohaisen
- Published
- 4-26-2023
- Affiliation
- University of Central Florida
- Country
- United States of America
- Conference
- Computing Research Repository (CoRR)
Abstract
Free content websites that provide free books, music, games, movies, etc.,
have existed on the Internet for many years. While it is a common belief that
such websites might be different from premium websites providing the same
content types, an analysis that supports this belief is lacking in the
literature. In particular, it is unclear if those websites are as safe as their
premium counterparts. In this paper, we set out to investigate, by analysis and
quantification, the similarities and differences between free content and
premium websites, including their risk profiles. To conduct this analysis, we
assembled a list of 834 free content websites offering books, games, movies,
music, and software, and 728 premium websites offering content of the same
type. We then contribute domain-, content-, and risk-level analysis, examining
and contrasting the websites' domain names, creation times, SSL certificates,
HTTP requests, page size, average load time, and content type. For risk
analysis, we consider and examine the maliciousness of these websites at the
website- and component-level. Among other interesting findings, we show that
free content websites tend to be vastly distributed across the TLDs and exhibit
more dynamics with an upward trend for newly registered domains. Moreover, the
free content websites are 4.5 times more likely to utilize an expired
certificate, 19 times more likely to be malicious at the website level, and
2.64 times more likely to be malicious at the component level. Encouraged by
the clear differences between the two types of websites, we explore the
automation and generalization of the risk modeling of the free content risky
websites, showing that a simple machine learning-based technique can produce
86.81\% accuracy in identifying them.