Feature selection is the process of sieving features, in which informative
features are separated from the redundant and irrelevant ones. This process
plays an important role in machine learning, data mining and bioinformatics.
However, traditional feature selection methods are only capable of processing
centralized datasets and are not able to satisfy today's distributed data
processing needs. These needs require a new category of data processing
algorithms called privacy-preserving feature selection, which protects users'
data by not revealing any part of the data neither in the intermediate
processing nor in the final results. This is vital for the datasets which
contain individuals' data, such as medical datasets. Therefore, it is rational
to either modify the existing algorithms or propose new ones to not only
introduce the capability of being applied to distributed datasets, but also act
responsibly in handling users' data by protecting their privacy. In this paper,
we will review three privacy-preserving feature selection methods and provide
suggestions to improve their performance when any gap is identified. We will
also propose a privacy-preserving feature selection method based on the rough
set feature selection. The proposed method is capable of processing both
horizontally and vertically partitioned datasets in two- and multi-parties
scenarios.