Distributed semi-supervised learning with kernel ridge regression

Research output: Contribution to journalArticlepeer-review

106 Scopus citations

Abstract

This paper provides error analysis for distributed semi-supervised learning with kernel ridge regression (DSKRR) based on a divide-and-conquer strategy. DSKRR applies kernel ridge regression (KRR) to data subsets that are distributively stored on multiple servers to produce individual output functions, and then takes a weighted average of the individual output functions as a final estimator. Using a novel error decomposition which divides the generalization error of DSKRR into the approximation error, sample error and distributed error, we find that the sample error and distributed error reflect the power and limitation of DSKRR, compared with KRR processing the whole data. Thus a small distributed error provides a large range of the number of data subsets to guarantee a small generalization error. Our results show that unlabeled data play important roles in reducing the distributed error and enlarging the number of data subsets in DSKRR. Our analysis also applies to the case when the regression function is out of the reproducing kernel Hilbert space. Numerical experiments including toy simulations and a music-prediction task are employed to demonstrate our theoretical statements and show the power of unlabeled data in distributed learning.

Original languageEnglish
Pages (from-to)1-22
Number of pages22
JournalJournal of Machine Learning Research
Volume18
StatePublished - 1 May 2017

Keywords

  • Distributed learning
  • Error decomposition
  • Kernel ridge regression
  • Learning theory
  • Semi-supervised learning
  • Unlabeled data

Fingerprint

Dive into the research topics of 'Distributed semi-supervised learning with kernel ridge regression'. Together they form a unique fingerprint.

Cite this