Integrating Web objects extracted from multiple sites into relational database

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

This paper studies the problem of integrating heterogeneous semi-structured Web objects into relational database. A generalized sequential learning model named the Combined Conditional Random Fields is presented for solving the problem of schema matching between pairs of heterogeneous Web data sources. The proposed model is able to learn on the manually labeled training data and unlabeled database records, thereby reducing the dependence on tediously labeled samples. It also provides a novel way to incorporate the two-dimensional neighborhood dependencies between Web data elements. Moreover, a constrained Viterbi algorithm is implemented to resolve the imposed labels inference for optimal data integration. Experimental results using a large number of Web pages from diverse domains show that the proposed method can improve the matching accuracy significantly.

Original languageEnglish
Pages (from-to)126-130+153
JournalXi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University
Volume34
Issue number1
StatePublished - Feb 2007

Keywords

  • Conditional random fields
  • Schema matching
  • Web data integration

Fingerprint

Dive into the research topics of 'Integrating Web objects extracted from multiple sites into relational database'. Together they form a unique fingerprint.

Cite this