Malicious calls, i.e., telephony spams and scams, have been a long-standing
challenging issue that causes billions of dollars of annual financial loss
worldwide. This work presents the first machine learning-based solution without
relying on any particular assumptions on the underlying telephony network
infrastructures. The main challenge of this decade-long problem is that it is
unclear how to construct effective features without the access to the telephony
networks' infrastructures. We solve this problem by combining several
innovations. We first develop a TouchPal user interface on top of a mobile App
to allow users tagging malicious calls. This allows us to maintain a
large-scale call log database. We then conduct a measurement study over three
months of call logs, including 9 billion records. We design 29 features based
on the results, so that machine learning algorithms can be used to predict
malicious calls. We extensively evaluate different state-of-the-art machine
learning approaches using the proposed features, and the results show that the
best approach can reduce up to 90% unblocked malicious calls while maintaining
a precision over 99.99% on the benign call traffic. The results also show the
models are efficient to implement without incurring a significant latency
overhead. We also conduct ablation analysis, which reveals that using 10 out of
the 29 features can reach a performance comparable to using all features.