The quest for better data analysis and artificial intelligence has lead to
more and more data being collected and stored. As a consequence, more data are
exposed to malicious entities. This paper examines the problem of privacy in
machine learning for classification. We utilize the Ridge Discriminant
Component Analysis (RDCA) to desensitize data with respect to a privacy label.
Based on five experiments, we show that desensitization by RDCA can effectively
protect privacy (i.e. low accuracy on the privacy label) with small loss in
utility. On HAR and CMU Faces datasets, the use of desensitized data results in
random guess level accuracies for privacy at a cost of 5.14% and 0.04%, on
average, drop in the utility accuracies. For Semeion Handwritten Digit dataset,
accuracies of the privacy-sensitive digits are almost zero, while the
accuracies for the utility-relevant digits drop by 7.53% on average. This
presents a promising solution to the problem of privacy in machine learning for
classification.