跳到主要导航 跳到搜索 跳到主要内容

Integrating Web objects extracted from multiple sites into relational database

  • Xidian University

科研成果: 期刊稿件文章同行评审

1 引用 (Scopus)

摘要

This paper studies the problem of integrating heterogeneous semi-structured Web objects into relational database. A generalized sequential learning model named the Combined Conditional Random Fields is presented for solving the problem of schema matching between pairs of heterogeneous Web data sources. The proposed model is able to learn on the manually labeled training data and unlabeled database records, thereby reducing the dependence on tediously labeled samples. It also provides a novel way to incorporate the two-dimensional neighborhood dependencies between Web data elements. Moreover, a constrained Viterbi algorithm is implemented to resolve the imposed labels inference for optimal data integration. Experimental results using a large number of Web pages from diverse domains show that the proposed method can improve the matching accuracy significantly.

源语言英语
页(从-至)126-130+153
期刊Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University
34
1
出版状态已出版 - 2月 2007

学术指纹

探究 'Integrating Web objects extracted from multiple sites into relational database' 的科研主题。它们共同构成独一无二的指纹。

引用此