摘要
This paper studies the problem of integrating heterogeneous semi-structured Web objects into relational database. A generalized sequential learning model named the Combined Conditional Random Fields is presented for solving the problem of schema matching between pairs of heterogeneous Web data sources. The proposed model is able to learn on the manually labeled training data and unlabeled database records, thereby reducing the dependence on tediously labeled samples. It also provides a novel way to incorporate the two-dimensional neighborhood dependencies between Web data elements. Moreover, a constrained Viterbi algorithm is implemented to resolve the imposed labels inference for optimal data integration. Experimental results using a large number of Web pages from diverse domains show that the proposed method can improve the matching accuracy significantly.
| 源语言 | 英语 |
|---|---|
| 页(从-至) | 126-130+153 |
| 期刊 | Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University |
| 卷 | 34 |
| 期 | 1 |
| 出版状态 | 已出版 - 2月 2007 |
学术指纹
探究 'Integrating Web objects extracted from multiple sites into relational database' 的科研主题。它们共同构成独一无二的指纹。引用此
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver