Keyword Extraction and Headline Generation Using Novel Word Features

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

10 Scopus citations

Abstract

We introduce several novel word features for keyword extraction and headline generation. These new word features are derived according to the background knowledge of a document as supplied by Wikipedia. Given a document, to acquire its background knowledge from Wikipedia, we first generate a query for searching the Wikipedia corpus based on the key facts present in the document. We then use the query to find articles in the Wikipedia corpus that are closely related to the contents of the document. With the Wikipedia search result article set, we extract the inlink, outlink, category and infobox information in each article to derive a set of novel word features which reflect the document's background knowledge. These newly introduced word features offer valuable indications on individual words' importance in the input document. They serve as nice complements to the traditional word features derivable from explicit information of a document. In addition, we also introduce a word-document fitness feature to characterize the influence of a document's genre on the keyword extraction and headline generation process. We study the effectiveness of these novel word features for keyword extraction and headline generation by experiments and have obtained very encouraging results.

Original languageEnglish
Title of host publicationProceedings of the 24th AAAI Conference on Artificial Intelligence, AAAI 2010
PublisherAAAI press
Pages1461-1466
Number of pages6
ISBN (Electronic)9781577354642
DOIs
StatePublished - 15 Jul 2010
Externally publishedYes
Event24th AAAI Conference on Artificial Intelligence, AAAI 2010 - Atlanta, United States
Duration: 11 Jul 201015 Jul 2010

Publication series

NameProceedings of the 24th AAAI Conference on Artificial Intelligence, AAAI 2010

Conference

Conference24th AAAI Conference on Artificial Intelligence, AAAI 2010
Country/TerritoryUnited States
CityAtlanta
Period11/07/1015/07/10

Fingerprint

Dive into the research topics of 'Keyword Extraction and Headline Generation Using Novel Word Features'. Together they form a unique fingerprint.

Cite this