In this work, we study the privacy risk due to profile matching across online
social networks (OSNs), in which anonymous profiles of OSN users are matched to
their real identities using auxiliary information about them. We consider
different attributes that are publicly shared by users. Such attributes include
both strong identifiers such as user name and weak identifiers such as interest
or sentiment variation between different posts of a user in different
platforms. We study the effect of using different combinations of these
attributes to profile matching in order to show the privacy threat in an
extensive way. The proposed framework mainly relies on machine learning
techniques and optimization algorithms. We evaluate the proposed framework on
three datasets (Twitter - Foursquare, Google+ - Twitter, and Flickr) and show
how profiles of the users in different OSNs can be matched with high
probability by using the publicly shared attributes and/or the underlying
graphical structure of the OSNs. We also show that the proposed framework
notably provides higher precision values compared to state-of-the-art that
relies on machine learning techniques. We believe that this work will be a
valuable step to build a tool for the OSN users to understand their privacy
risks due to their public sharings.