Accuracy in Privacy-Preserving Data Mining Using the Paradigm of Cryptographic Elections

Abstract.

Data mining technology raises concerns about the handling and use of sensitive information, especially in highly distributed environments where the participants in the system may by mutually mistrustful. In this paper we argue in favor of using some well-known cryptographic primitives, borrowed from the literature on large-scale Internet elections, in order to preserve accuracy in Privacy-Preserving Data Mining (PPDM) systems. Our approach is based on the classical homomorphic model for online elections, and more particularly on some extensions of the model for supporting multi-candidate elections. We also describe some weaknesses and present an attack on a recent scheme (Yang et al, 2005)) which was the first to use a variation of the homomorphic model in the PPDM setting. In addition, we show how PPDM can be used as a building block to obtain a Random Forests classification algorithm over a set of homogeneous databases with horizontally partitioned data.

Key words: Privacy Preserving Data Mining; Privacy and Accuracy; Homomorphic encryption; Distributed Data Mining; Statistical Databases.

Download: (PDF file)