AIセキュリティポータル K Program
Exploring User Privacy Awareness on GitHub: An Empirical Study
Share
Abstract
GitHub provides developers with a practical way to distribute source code and collaboratively work on common projects. To enhance account security and privacy, GitHub allows its users to manage access permissions, review audit logs, and enable two-factor authentication. However, despite the endless effort, the platform still faces various issues related to the privacy of its users. This paper presents an empirical study delving into the GitHub ecosystem. Our focus is on investigating the utilization of privacy settings on the platform and identifying various types of sensitive information disclosed by users. Leveraging a dataset comprising 6,132 developers, we report and analyze their activities by means of comments on pull requests. Our findings indicate an active engagement by users with the available privacy settings on GitHub. Notably, we observe the disclosure of different forms of private information within pull request comments. This observation has prompted our exploration into sensitivity detection using a large language model and BERT, to pave the way for a personalized privacy assistant. Our work provides insights into the utilization of existing privacy protection tools, such as privacy settings, along with their inherent limitations. Essentially, we aim to advance research in this field by providing both the motivation for creating such privacy protection tools and a proposed methodology for personalizing them.
Security developer studies with {GitHub} users: Exploring a convenience sample
Acar Y, Stransky C, Wermke D, Mazurek ML, Fahl S
Published: 2017
Privacy policy analysis with sentence classification
Adhikari A, Das S, Dewri R
Published: 2022
Sentiment analysis in tourism: capitalizing on big data
Alaei AR, Becken S, Stantic B
Published: 2019
Double-blind review in software engineering venues: The community’s perspective
Bacchelli A, Beller M
Published: 2017
Ewtune: A framework for privately fine-tuning large language models with differential privacy
R. Behnia, M. R. Ebrahimi, J. Pacheco, B. Padmanabhan
Published: 2022
Analysis and classification of privacy-sensitive content in social media posts
Bioglio L, Pensa RG
Published: 2022
Prediction of mobile app privacy preferences with user profiles via federated learning
Brand˜ao A, Mendes R, Vilela JP
Published: 2022
Detecting privacy requirements from user stories with nlp transfer learning models
Casillo F, Deufemia V, Gravino C
Published: 2022
Demystifying hidden privacy settings in mobile apps
Chen Y, Zha M, Zhang N, Xu D, Zhao Q, Feng X, Yuan K, Suya F, Tian Y, Chen K
Published: 2019
The role of privacy fatigue in online privacy behavior
Choi H, Park J, Jung Y
Published: 2018
A coefficient of agreement for nominal scales
Cohen, J.
Published: 1960
“most americans like their privacy.” exploring privacy concerns through us guests’ reviews
D’Acunto D, Volo S, Filieri R
Published: 2021
What (or who) is public? privacy settings and social media content sharing
Fiesler C, Dye M, Feuston JL, Hiruncharoenvate C, Hutto CJ, Morrison S, Khanipour Roshan P, Pavalanathan U, Bruckman AS, De Choudhury M
Published: 2017
How to save democracy from technology: ending big tech’s information monopoly
Fukuyama F, Richman B, Goel A
Published: 2021
Privacy dictionary: a linguistic taxonomy of privacy for content analysis
Gill AJ, Vasalou A, Papoutsi C, Joinson AN
Published: 2011
The ghtorent dataset and tool suite
Gousios G
Published: 2013
Sentiment analysis of commit comments in github: an empirical study
Guzman E, Az´ocar D, Li Y
Published: 2014
Algorithm as 136: A k-means clustering algorithm
J. A. Hartigan, M. A. Wong
Published: 1979
They posted what? Recruiter use of social media for selection
Henderson KE
Published: 2019
A review on evaluation metrics for data classification evaluations
Hossin M, Sulaiman MN
Published: 2015
Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations
Hakan Inan, Kartikeya Upasani, Jianfeng Chi, Rashi Rungta, Krithika Iyer, Yuning Mao, Michael Tontchev, Qing Hu, Brian Fuller, Davide Testuggine, Madian Khabsa
Published: 2023.12.8
Systematic review on privacy categorisation
Inverardi P, Migliarini P, Palmiero M
Published: 2023
Effects of personality traits on pull request acceptance
Iyer RN, Yun SA, Nagappan M, Hoey J
Published: 2019
Privacy preferences vs. privacy settings: An exploratory facebook study
Kanampiu M, Anwar M
Published: 2019
Analysing privacy leakage of life events on twitter
Kek¨ull¨uoglu D, Magdy W, Vaniea K
Published: 2020
How are diverse end-user human-centric issues discussed on github?
Khalajzadeh H, Shahin M, Obie HO, Grundy J
Published: 2022
Cluster analysis and data mining: An introduction
King RS
Published: 2015
Privacy attitudes and privacy behaviour: A review of current research on the privacy paradox phenomenon
Kokolakis S
Published: 2017
Llama-reviewer: Advancing code review automation with large language models through parameter-efficient fine-tuning
Lu J, Yu L, Li X, Yang L, Zuo C
Published: 2023
On the elicitation of privacy and ethics preferences of mobile users
Migliarini P, Scoccia GL, Autili M, Inverardi P
Published: 2020
Recent advances in natural language processing via large pre-trained language models: A survey
B. Min, H. Ross, E. Sulem, A. P. B. Veyseh, T. H. Nguyen, O. Sainz, E. Agirre, I. Heintz, D. Roth
Published: 2023
CodexLeaks: Privacy leaks from code generation language models in GitHub copilot
Niu L, Mirza S, Maradni Z, P¨opper C
Published: 2023
The california consumer privacy act: Towards a european-style privacy regime in the united states
S. L. Pardau
Published: 2018
Interpersonal trust in oss: Exploring dimensions of trust in github pull requests
Sajadi A, Damevski K, Chatterjee P
Published: 2023
A recommendation approach for user privacy preferences in the fitness domain
Sanchez OR, Torre I, He Y, Knijnenburg BP
Published: 2020
The myth of the privacy paradox
Solove DJ
Published: 2021
Integration k-means clustering method and elbow method for identification of the best customer profile cluster
M Syakur, Bain Khusnul Khotimah, Eka Rohman
Published: 2018
Understanding privacy-related questions on stack overflow
Tahaei M, Vaniea K, Saphra N
Published: 2020
Does Synthetic Data Generation of LLMs Help Clinical Text Mining?
Tang R, Han X, Jiang X, Hu X
Published: 2023
Gender differences and bias in open source: Pull request acceptance of women versus men
Terrell J, Kofink A, Middleton J, Rainear C, Murphy-Hill E, Parnin C, Stallings J
Published: 2017
Share