Personally identifiable information (PII) can find its way into cyberspace
through various channels, and many potential sources can leak such information.
Data sharing (e.g. cross-agency data sharing) for machine learning and
analytics is one of the important components in data science. However, due to
privacy concerns, data should be enforced with strong privacy guarantees before
sharing. Different privacy-preserving approaches were developed for privacy
preserving data sharing; however, identifying the best privacy-preservation
approach for the privacy-preservation of a certain dataset is still a
challenge. Different parameters can influence the efficacy of the process, such
as the characteristics of the input dataset, the strength of the
privacy-preservation approach, and the expected level of utility of the
resulting dataset (on the corresponding data mining application such as
classification). This paper presents a framework named \underline{P}rivacy
\underline{P}reservation \underline{a}s \underline{a} \underline{S}ervice
(PPaaS) to reduce this complexity. The proposed method employs selective
privacy preservation via data perturbation and looks at different dynamics that
can influence the quality of the privacy preservation of a dataset. PPaaS
includes pools of data perturbation methods, and for each application and the
input dataset, PPaaS selects the most suitable data perturbation approach after
rigorous evaluation. It enhances the usability of privacy-preserving methods
within its pool; it is a generic platform that can be used to sanitize big data
in a granular, application-specific manner by employing a suitable combination
of diverse privacy-preserving algorithms to provide a proper balance between
privacy and utility.