These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Data is the key asset for organizations and data sharing is lifeline for
organization growth; which may lead to data loss. Data leakage is the most
critical issue being faced by organizations. In order to mitigate the data
leakage issues data leakage prevention systems (DLPSs) are deployed at various
levels by the organizations. DLPSs are capable to protect all kind of data i.e.
DAR, DIM/DIT, DIU. Statistical analysis, regular expression, data
fingerprinting are common approaches exercised in DLP system. Out of these
techniques; statistical analysis approach is most appropriate for proposed DLP
model of data security. This paper defines a statistical DLP model for document
classification. Model uses various statistical approaches like TF-IDF (Term
Frequency- Inverse Document Frequency) a renowned term count/weighing function,
Vectorization, Gradient boosting document classification etc. to classify the
documents before allowing any access to it. Machine learning is used to test
and train the model. Proposed model also introduces an extremely efficient and
more accurate approach; IGBCA (Improvised Gradient Boosting Classification
Algorithm); for document classification, to prevent them from possible data
leakage. Results depicts that proposed model can classify documents with high
accuracy and on basis of which data can be prevented from being loss.