Preserving Privacy When Sharing Distributed Transactional Data
6 Pages Posted: 20 Jan 2009
Date Written: December 8, 2007
The need to preserve privacy when sharing data across organizations has been recognized as an important issue. In the context of transactional data, privacy is usually preserved by explicitly hiding sensitive information prior to sharing. Often, the data to be shared is stored in a distributed manner by the data owner, where the database is horizontally partitioned to reflect the firms operations in different locations or regions. In such situations, the owner must consider hiding sensitive patterns not only in the consolidated database but also patterns that occur within each partition of the distributed database. We present an Integer Programming (IP) formulation for minimizing data distortion to a distributed database while hiding sensitive patterns. The formulation can become large for distributed databases with multiple partitions and the IP may not be solvable. For such situations, we propose three alternative procedures - Procedure A, Procedure B and Procedure Hybrid - that exploit the distributed nature of the data to decompose the larger problem into a series of smaller problems. We examine the performance of these procedures using computational experiments. The major findings are: i) problems of sizes that cannot be solved to optimality can be solved by these three procedures easily; ii) the differences between the solutions obtained from either Procedure A or Procedure B and the optimal solutions are quite small; and iii) the hybrid procedure, which incorporates in its formulation commonalities between the solutions provided by the other two procedures, is able to obtain solutions that are even closer to the optimal.
Keywords: data quality, distributed data, accuracy, heuristics, Integer Programming
Suggested Citation: Suggested Citation