Document Clustering Based On Non-negative Matrix Factorization

Research output: Contribution to journalConference articlepeer-review

1719 Scopus citations

Abstract

In this paper, we propose a novel document clustering method based on the non-negative factorization of the term-document matrix of the given document corpus. In the latent semantic space derived by the non-negative matrix factorization (NMF), each axis captures the base topic of a particular document cluster, and each document is represented as an additive combination of the base topics. The cluster membership of each document can be easily determined by finding the base topic (the axis) with which the document has the largest projection value. Our experimental evaluations show that the proposed document clustering method surpasses the latent semantic indexing and the spectral clustering methods not only in the easy and reliable derivation of document clustering results, but also in document clustering accuracies.

Original languageEnglish
Pages (from-to)267-273
Number of pages7
JournalSIGIR Forum (ACM Special Interest Group on Information Retrieval)
Issue numberSPEC. ISS.
DOIs
StatePublished - 2003
Externally publishedYes
EventProceedings of the Twenty-Sixth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2003 - Toronto, Ont., Canada
Duration: 28 Jul 20031 Aug 2003

Keywords

  • Document Clustering
  • Non-negative Matrix Factorization

Fingerprint

Dive into the research topics of 'Document Clustering Based On Non-negative Matrix Factorization'. Together they form a unique fingerprint.

Cite this