On the comparison of some fuzzy clustering methods for privacy preserving data mining: Towards the development of specific information loss measures

Vicenç Torra; Yasunori Endo; Sadaaki Miyamoto

Kybernetika (2009)

  • Volume: 45, Issue: 3, page 548-560
  • ISSN: 0023-5954

Abstract

top
Policy makers and researchers require raw data collected from agencies and companies for their analysis. Nevertheless, any transmission of data to third parties should satisfy some privacy requirements in order to avoid the disclosure of sensitive information. The areas of privacy preserving data mining and statistical disclosure control develop mechanisms for ensuring data privacy. Masking methods are one of such mechanisms. With them, third parties can do computations with a limited risk of disclosure. Disclosure risk and information loss measures have been developed in order to evaluate in which extent data is protected and in which extent data is perturbated. Most of the information loss measures currently existing in the literature are general purpose ones (i. e., not oriented to a particular application). In this work we develop cluster specific information loss measures (for fuzzy clustering). For this purpose we study how to compare the results of fuzzy clustering. I. e., how to compare fuzzy clusters.

How to cite

top

Torra, Vicenç, Endo, Yasunori, and Miyamoto, Sadaaki. "On the comparison of some fuzzy clustering methods for privacy preserving data mining: Towards the development of specific information loss measures." Kybernetika 45.3 (2009): 548-560. <http://eudml.org/doc/37669>.

@article{Torra2009,
abstract = {Policy makers and researchers require raw data collected from agencies and companies for their analysis. Nevertheless, any transmission of data to third parties should satisfy some privacy requirements in order to avoid the disclosure of sensitive information. The areas of privacy preserving data mining and statistical disclosure control develop mechanisms for ensuring data privacy. Masking methods are one of such mechanisms. With them, third parties can do computations with a limited risk of disclosure. Disclosure risk and information loss measures have been developed in order to evaluate in which extent data is protected and in which extent data is perturbated. Most of the information loss measures currently existing in the literature are general purpose ones (i. e., not oriented to a particular application). In this work we develop cluster specific information loss measures (for fuzzy clustering). For this purpose we study how to compare the results of fuzzy clustering. I. e., how to compare fuzzy clusters.},
author = {Torra, Vicenç, Endo, Yasunori, Miyamoto, Sadaaki},
journal = {Kybernetika},
keywords = {privacy preserving data mining; statistical disclosure control; fuzzy clustering; fuzzy c-means; fuzzy c-means with tolerance; privacy preserving data mining; statistical disclosure control; fuzzy clustering; fuzzy -means; fuzzy -means with tolerance},
language = {eng},
number = {3},
pages = {548-560},
publisher = {Institute of Information Theory and Automation AS CR},
title = {On the comparison of some fuzzy clustering methods for privacy preserving data mining: Towards the development of specific information loss measures},
url = {http://eudml.org/doc/37669},
volume = {45},
year = {2009},
}

TY - JOUR
AU - Torra, Vicenç
AU - Endo, Yasunori
AU - Miyamoto, Sadaaki
TI - On the comparison of some fuzzy clustering methods for privacy preserving data mining: Towards the development of specific information loss measures
JO - Kybernetika
PY - 2009
PB - Institute of Information Theory and Automation AS CR
VL - 45
IS - 3
SP - 548
EP - 560
AB - Policy makers and researchers require raw data collected from agencies and companies for their analysis. Nevertheless, any transmission of data to third parties should satisfy some privacy requirements in order to avoid the disclosure of sensitive information. The areas of privacy preserving data mining and statistical disclosure control develop mechanisms for ensuring data privacy. Masking methods are one of such mechanisms. With them, third parties can do computations with a limited risk of disclosure. Disclosure risk and information loss measures have been developed in order to evaluate in which extent data is protected and in which extent data is perturbated. Most of the information loss measures currently existing in the literature are general purpose ones (i. e., not oriented to a particular application). In this work we develop cluster specific information loss measures (for fuzzy clustering). For this purpose we study how to compare the results of fuzzy clustering. I. e., how to compare fuzzy clusters.
LA - eng
KW - privacy preserving data mining; statistical disclosure control; fuzzy clustering; fuzzy c-means; fuzzy c-means with tolerance; privacy preserving data mining; statistical disclosure control; fuzzy clustering; fuzzy -means; fuzzy -means with tolerance
UR - http://eudml.org/doc/37669
ER -

References

top
  1. Pattern Recognition with Fuzzy Objective Function Algorithms, Plenum Press, New York 1981. Zbl0503.68069MR0631231
  2. [unknown], CASC: Computational Aspects of Statistical Confidentiality, EU Project, 
  3. [unknown], http://neon.vb.cbs.nl/casc/ (Test Sets) 
  4. Disclosure control methods and information loss for microdata, In: Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies (P. Doyle, J. I. Lane, J. J. M. Theeuwes, and L. M. Zayatz, eds.), Elsevier 2001, pp. 91–110, 
  5. A quantitative comparison of disclosure control methods for microdata, In: Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies (P. Doyle, J. I. Lane, J. J. M. Theeuwes, and L. M. Zayatz, eds.), Elsevier 2001, pp. 111–133. 
  6. Disclosure limitation methods and information loss for tabular data, In: Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies (P. Doyle, J. I. Lane, J. J. M. Theeuwes, and L. M. Zayatz, eds.), Elsevier 2001, pp. 135–166. 
  7. Disclosure Risk vs, Data Utility: The R-U Confidentiality Map. Technical Report No. 121 of National Institute of Statistical Sciences 2001, www.niss.org. 
  8. Database Security and Confidentiality: Examining Disclosure Risk vs, Data Utility Through the R-U Confidentiality Map. Technical Report No. 142 of National Institute of Statistical Sciences 2004, www.niss.org. 
  9. Fuzzy c -means for data with tolerance defined as hyper-rectangle, In: Proc. MDAI 2007 (Lecture Notes in Artificial Intelligence 4617), pp. 237–248. 
  10. Data access in a cyber world: Making use of cyberinfrastructure, Trans. Data Privacy 1 (2008), 2–16. MR2657177
  11. Ordered data set vectorization for linear regression on data privacy, In: Proc. MDAI 2007 (Lecture Notes in Artificial Intelligence 4617), Springer, Berlin 2007, pp. 361–372. 
  12. Methods in gard and fuzzy clustering, In: Soft Computing and Human-Centered Machines (Z.-Q. Liu and S. Miyamoto, eds.), Springer, Tokyo 2000, 85–129. 
  13. A privacy-preserving technique for Euclidean distance-based mining algorithms using Fourier-related transforms, The VLDB Journal 15 (2006), 293–315. 
  14. On fuzzy c -means for data with tolerance, J. Advanced Computational Intelligence and Intelligent Informatics 10 (2006), 5, 673–681. 
  15. Rethinking rank swapping to decrease disclosure risk, Data and Knowledge Engrg. 64 (2008), 1, 346–364. 
  16. On the complexity of optimal microaggregation for statistical disclosure control, Statistical J. United Nations Economic Commission for Europe 18 (2000), 4, 345–354. 
  17. Record linkage methods for multidatabase data mining, In: Information Fusion in Data Mining (V. Torra, ed.), Springer 2003, pp. 101–132. 
  18. (2008) Record linkage for database integration using fuzzy integrals, Internat. J. Intel. Systems 23 (2008), 715–734. 
  19. Decision Models for Data Disclosure Limitation, Ph.D. Dissertation, Carnegie Mellon University 2003, http://www.niss.org/dgii/TR/Thesis-Trottini-final.pdf. MR2620875
  20. Disclosure risk assessment in perturbative microdata protection, In: Inference Control in Statistical Databases 2002 (Lecture Notes in Computer Science 2316), Springer, Berlin 2003, pp. 135–152. MR1967902
  21. Protocols for secure computations, In: Proc. 23rd IEEE Symposium on Foundations of Computer Science, Chicago 1982, pp. 160–164. MR0780394

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.