Skip to main navigation Skip to search Skip to main content

A latent topic model for linked documents

  • Zhen Guo
  • , Shenghuo Zhu
  • , Yun Chi
  • , Zhongfei Zhang
  • , Yihong Gong

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

17 Scopus citations

Abstract

Documents in many corpora, such as digital libraries and webpages, contain both content and link information. To explicitly consider the document relations represented by links, in this paper we propose a citation-topic (CT) model which assumes a probabilistic generative process for corpora. In the CT model a given document is modeled as a mixture of a set of topic distributions, each of which is borrowed (cited) from a document that is related to the given document. Moreover, the CT model contains a random process for selecting the related documents according to the structure of the generative model determined by links and therefore, the transitivity of the relations among documents is captured. We apply the CT model on the document clustering task and the experimental comparisons against several state-of-the-art approaches demonstrate very promising performances.

Original languageEnglish
Title of host publicationProceedings - 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2009
Pages720-721
Number of pages2
DOIs
StatePublished - 2009
Externally publishedYes
Event32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2009 - Boston, MA, United States
Duration: 19 Jul 200923 Jul 2009

Publication series

NameProceedings - 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2009

Conference

Conference32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2009
Country/TerritoryUnited States
CityBoston, MA
Period19/07/0923/07/09

Keywords

  • Document clustering
  • Topic model

Fingerprint

Dive into the research topics of 'A latent topic model for linked documents'. Together they form a unique fingerprint.

Cite this