This study explores privacy-preserving machine learning (PPML) techniques using the PySyft platform to enable collaborative prediction of student retention between institutions. We developed a remote data science (RDS) framework with a semi-air-gapped architecture consisting of high-side and low-side servers, allowing researchers from three universities to build predictive models on sensitive student data without direct data access. Using historical data from a small private university (N=720), we evaluated three synthetic data generation approaches and validated the framework through inter-institutional collaboration. The results demonstrate consistent classification performance across institutions (Macro F1: 0.690--0.695) while maintaining strict Family Educational Rights and Privacy Act (FERPA) compliance. We also propose Data-Type-Aware Templates, a novel synthetic data method that prioritizes privacy over distributional fidelity. Our findings confirm that RDS-based PPML is technically feasible for educational settings and offers a practical alternative to federated learning for small-scale inter-institutional collaborations. The code is available at https://github.com/jtfields/NAIRR240195-Privacy-Preserving-Machine-Learning.
外部データセット
Faketucky
参考文献
Am. Econ. J. Appl. Econ.
Why have college completion rates increased?
J. T. Denning, E. R. Eide, K. J. Mumford, R. W. Patterson, M. Warnick
Published: 2022
Eur. J. Higher Educ.
Predicting student dropout: A machine learning approach
L. Kemper, G. Vorhoff, B. U. Wigger
Published: 2020
Proc. IEEE Big Data Conf.
Integrating categorical and continuous data in a cluster-then-classify methodology for predicting undergraduate student success
J. Fields, K. Chovanec, P. Madiraju
Published: 2024
Indian J. Sci. Technol.
Predicting university dropout through data mining: A systematic literature
M. Alban, D. Mauricio
Published: 2019
Texto Libre
Student clustering procedure according to dropout risk to improve student management in higher education
M. Hinojosa
Published: 2022
Proc. EDM
Modeling and experimental design for MOOC dropout prediction: A replication perspective
J. Gardner, Y. Yang, R. Baker, C. Brooks
Published: 2019
Proc. EDM
Predicting students drop out: A case study
G. Dekker, M. Pechenizkiy, J. Vleeshouwers
Published: 2009
Expert Syst.
Early dropout prediction using data mining: A case study with high school students
C. Marquez-Vera
Published: 2016
Educ. Inf. Technol.
Predictive learning analytics using deep learning model in MOOCs courses videos
A. A. Mubarak, H. Cao, S. A. M. Ahmed
Published: 2021
Int. J. Educ. Technol. Higher Educ.
Extracting topological features to identify at-risk students using ML and GCN models
B. Albreiki, T. Habuza, N. Zaki
Published: 2023
Internet High. Educ.
Learning analytics should not promote one size fits all
D. Gasevič, S. Dawson, T. Rogers, D. Gasevic
Published: 2016
Proc. ACM Conf. Fairness, Accountability, and Transparency (FAccT)
Cross-institutional transfer learning for educational models: Implications for model performance, fairness, and equity
J. Gardner, R. Yu, Q. Nguyen, C. Brooks, R. Kizilcec
Published: 2023
TensorFlow Blog
Introducing TensorFlow Federated
A. Ingerman, K. Ostrowski
Published: 2019
Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security
Model inversion attacks that exploit confidence information and basic countermeasures