TY - JOUR
T1 - Privacy-Preserving Linear Regression on Distributed Data by Homomorphic Encryption and Data Masking
AU - Qiu, Guowei
AU - Gui, Xiaolin
AU - Zhao, Yingliang
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2020
Y1 - 2020
N2 - Linear regression is a basic method that models the relationship between an outcome value and some explanatory values using a linear function. Traditionally, this method is conducted on a clear dataset provided by one data owner. However, in today's ever-increasingly digital world, the data for regression analysis are likely distributed among multiple parties and even contain sensitive information about the data owners. In this case, data owners are not willing to share their data unless data privacy is guaranteed. In this paper, we propose a novel protocol for conducting privacy-preserving linear regression (PPLR) on horizontally partitioned data. Our system architecture includes multiple clients and two noncolluding servers. In our protocol, each client submits its data in encrypted form to a server, and two servers collaboratively determine the regression model on pooled data without learning its contents. We construct our protocol with Paillier homomorphic encryption and a new data masking technique. This data masking technique can perturb data by multiplying a rational number while the data are encrypted. Due to the use of the data masking technique, the efficiency of our protocol is greatly improved. We provide an error bound of the protocol and prove it rigorously. We also provide security analysis of the protocol. Finally, we implement our system in C++ and Java, and then we evaluate our protocol using real datasets provided by UCI. The experiments show our protocol is one of the most effective approaches to date and has negligible errors compared with performing linear regression on clear data.
AB - Linear regression is a basic method that models the relationship between an outcome value and some explanatory values using a linear function. Traditionally, this method is conducted on a clear dataset provided by one data owner. However, in today's ever-increasingly digital world, the data for regression analysis are likely distributed among multiple parties and even contain sensitive information about the data owners. In this case, data owners are not willing to share their data unless data privacy is guaranteed. In this paper, we propose a novel protocol for conducting privacy-preserving linear regression (PPLR) on horizontally partitioned data. Our system architecture includes multiple clients and two noncolluding servers. In our protocol, each client submits its data in encrypted form to a server, and two servers collaboratively determine the regression model on pooled data without learning its contents. We construct our protocol with Paillier homomorphic encryption and a new data masking technique. This data masking technique can perturb data by multiplying a rational number while the data are encrypted. Due to the use of the data masking technique, the efficiency of our protocol is greatly improved. We provide an error bound of the protocol and prove it rigorously. We also provide security analysis of the protocol. Finally, we implement our system in C++ and Java, and then we evaluate our protocol using real datasets provided by UCI. The experiments show our protocol is one of the most effective approaches to date and has negligible errors compared with performing linear regression on clear data.
KW - Privacy-preserving regression
KW - data masking
KW - homomorphic encryption
KW - linear regression
KW - multiplicative perturbation
UR - https://www.scopus.com/pages/publications/85086988069
U2 - 10.1109/ACCESS.2020.3000764
DO - 10.1109/ACCESS.2020.3000764
M3 - 文章
AN - SCOPUS:85086988069
SN - 2169-3536
VL - 8
SP - 107601
EP - 107613
JO - IEEE Access
JF - IEEE Access
M1 - 9110896
ER -