Preserving Privacy When Sharing Distributed Transactional Data

6 Pages Posted: 20 Jan 2009

See all articles by Jing Hao

Jing Hao

University of Texas at Dallas - Department of Information Systems & Operations Management

Syam Menon

University of Texas at Dallas - Naveen Jindal School of Management

Sumit Sarkar

University of Texas at Dallas - Department of Information Systems & Operations Management

Date Written: December 8, 2007

Abstract

The need to preserve privacy when sharing data across organizations has been recognized as an important issue. In the context of transactional data, privacy is usually preserved by explicitly hiding sensitive information prior to sharing. Often, the data to be shared is stored in a distributed manner by the data owner, where the database is horizontally partitioned to reflect the firms operations in different locations or regions. In such situations, the owner must consider hiding sensitive patterns not only in the consolidated database but also patterns that occur within each partition of the distributed database. We present an Integer Programming (IP) formulation for minimizing data distortion to a distributed database while hiding sensitive patterns. The formulation can become large for distributed databases with multiple partitions and the IP may not be solvable. For such situations, we propose three alternative procedures - Procedure A, Procedure B and Procedure Hybrid - that exploit the distributed nature of the data to decompose the larger problem into a series of smaller problems. We examine the performance of these procedures using computational experiments. The major findings are: i) problems of sizes that cannot be solved to optimality can be solved by these three procedures easily; ii) the differences between the solutions obtained from either Procedure A or Procedure B and the optimal solutions are quite small; and iii) the hybrid procedure, which incorporates in its formulation commonalities between the solutions provided by the other two procedures, is able to obtain solutions that are even closer to the optimal.

Keywords: data quality, distributed data, accuracy, heuristics, Integer Programming

Suggested Citation

Hao, Jing and Menon, Syam and Sarkar, Sumit, Preserving Privacy When Sharing Distributed Transactional Data (December 8, 2007). Available at SSRN: https://ssrn.com/abstract=1326556 or http://dx.doi.org/10.2139/ssrn.1326556

Jing Hao

University of Texas at Dallas - Department of Information Systems & Operations Management ( email )

P.O. Box 830688
Richardson, TX 75083-0688
United States

Syam Menon

University of Texas at Dallas - Naveen Jindal School of Management ( email )

P.O. Box 830688
Richardson, TX 75083-0688
United States

Sumit Sarkar (Contact Author)

University of Texas at Dallas - Department of Information Systems & Operations Management ( email )

P.O. Box 830688
Richardson, TX 75083-0688
United States
972-883-6854 (Phone)
972-883-6811 (Fax)

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
26
Abstract Views
391
PlumX Metrics