GitHub provides developers with a practical way to distribute source code and
collaboratively work on common projects. To enhance account security and
privacy, GitHub allows its users to manage access permissions, review audit
logs, and enable two-factor authentication. However, despite the endless
effort, the platform still faces various issues related to the privacy of its
users. This paper presents an empirical study delving into the GitHub
ecosystem. Our focus is on investigating the utilization of privacy settings on
the platform and identifying various types of sensitive information disclosed
by users. Leveraging a dataset comprising 6,132 developers, we report and
analyze their activities by means of comments on pull requests. Our findings
indicate an active engagement by users with the available privacy settings on
GitHub. Notably, we observe the disclosure of different forms of private
information within pull request comments. This observation has prompted our
exploration into sensitivity detection using a large language model and BERT,
to pave the way for a personalized privacy assistant. Our work provides
insights into the utilization of existing privacy protection tools, such as
privacy settings, along with their inherent limitations. Essentially, we aim to
advance research in this field by providing both the motivation for creating
such privacy protection tools and a proposed methodology for personalizing
them.
外部データセット
GHTorrent
Active users
参考文献
Thirteenth Symposium on Usable Privacy and Security (SOUPS 2017)
Security developer studies with {GitHub} users: Exploring a convenience sample