[Abstract]:Because of the development of various technologies, privacy issues have become a high demand, because more and more personal data are shared through different organizations, devices, and the Internet of things, which brings risks to the privacy and use of personal data. Especially when confidentiality is realized, the utility of data set is decreasing. Therefore, the new problem is to keep the privacy of a larger data set while preserving multi-utility. In this study, a new privacy model is established. The model uses Flash sorting algorithm and K _ anonymity combined with C4.5 classification technology to protect privacy data while preserving the optimal utility of data sets. The first step of this method is to use powerful data privacy grant technology on statistical data sets with 30162 records and attributes, and select the best K-anonymous data set using flash sorting algorithm. Set the privacy level to 2, and then make the dataset as useful as possible through the C 4.5 classification process. Further, by reducing the size of the statistical data set by half (15081 records), then using the same method to test. Then reduce the number of attributes, using the same algorithm for testing. The results of this study reveal some important results. Compared with other studies, the method in this paper is able to maintain the accuracy of the data. The results show that the data set is 90.77% useful, and when the size of the data set is reduced to half, The loss of data utility is only 0.5. When the attribute of data set is reduced, the loss of utility is only 2.28. When the size of the data set is large, the loss is 1.24 when compared with the original non-anonymous data set. Although it provides a high precision result, it can not obtain the maximum expectation on a larger data set. The results show that our method can provide the lowest utility value and attribute reduction number when reducing the number of data sets. The study predicts that changing privacy methods and using different types of classifiers will produce better results in the future, especially when dealing with larger data sets.
