AIセキュリティポータル K Program
A Privacy-Preserving Framework Using Remote Data Science for Inter-Institutional Student Retention Prediction
Share
Abstract
This study explores privacy-preserving machine learning (PPML) techniques using the PySyft platform to enable collaborative prediction of student retention between institutions. We developed a remote data science (RDS) framework with a semi-air-gapped architecture consisting of high-side and low-side servers, allowing researchers from three universities to build predictive models on sensitive student data without direct data access. Using historical data from a small private university (N=720), we evaluated three synthetic data generation approaches and validated the framework through inter-institutional collaboration. The results demonstrate consistent classification performance across institutions (Macro F1: 0.690--0.695) while maintaining strict Family Educational Rights and Privacy Act (FERPA) compliance. We also propose Data-Type-Aware Templates, a novel synthetic data method that prioritizes privacy over distributional fidelity. Our findings confirm that RDS-based PPML is technically feasible for educational settings and offers a practical alternative to federated learning for small-scale inter-institutional collaborations. The code is available at https://github.com/jtfields/NAIRR240195-Privacy-Preserving-Machine-Learning.
Why have college completion rates increased?
J. T. Denning, E. R. Eide, K. J. Mumford, R. W. Patterson, M. Warnick
Published: 2022
Predicting student dropout: A machine learning approach
L. Kemper, G. Vorhoff, B. U. Wigger
Published: 2020
Integrating categorical and continuous data in a cluster-then-classify methodology for predicting undergraduate student success
J. Fields, K. Chovanec, P. Madiraju
Published: 2024
Predicting university dropout through data mining: A systematic literature
M. Alban, D. Mauricio
Published: 2019
Student clustering procedure according to dropout risk to improve student management in higher education
M. Hinojosa
Published: 2022
Modeling and experimental design for MOOC dropout prediction: A replication perspective
J. Gardner, Y. Yang, R. Baker, C. Brooks
Published: 2019
Predicting students drop out: A case study
G. Dekker, M. Pechenizkiy, J. Vleeshouwers
Published: 2009
Early dropout prediction using data mining: A case study with high school students
C. Marquez-Vera
Published: 2016
Predictive learning analytics using deep learning model in MOOCs courses videos
A. A. Mubarak, H. Cao, S. A. M. Ahmed
Published: 2021
Extracting topological features to identify at-risk students using ML and GCN models
B. Albreiki, T. Habuza, N. Zaki
Published: 2023
Learning analytics should not promote one size fits all
D. Gasevič, S. Dawson, T. Rogers, D. Gasevic
Published: 2016
Cross-institutional transfer learning for educational models: Implications for model performance, fairness, and equity
J. Gardner, R. Yu, Q. Nguyen, C. Brooks, R. Kizilcec
Published: 2023
Introducing TensorFlow Federated
A. Ingerman, K. Ostrowski
Published: 2019
Model inversion attacks that exploit confidence information and basic countermeasures
Matt Fredrikson, Somesh Jha, Thomas Ristenpart
Published: 2015
Enhanced Membership Inference Attacks against Machine Learning Models
Jiayuan Ye, Aadyaa Maddi, Sasi Kumar Murakonda, Vincent Bindschaedler, Reza Shokri
Published: 2021.11.18
DataSHIELD: Mitigating disclosure risk in a multi-site federated analysis platform
D. Avraam
Published: 2025
Sdv: an open source library for synthetic data generation
A. Montanez et al.
Published: 2018
Faketucky: OpenSDP college-going dataset
Center for Education Policy Research at Harvard University
Published: 2017
Federated learning analytics: Investigating the privacy-performance trade-off
M. van Haastrecht, M. Brinkhuis, M. Spruit
Published: 2024
Differential privacy
C. Dwork
Published: 2006
Share