TY - GEN
T1 - Semantic Enhancement and Multi-level Label Embedding for Chinese News Headline Classification
AU - Qi, Jiangnan
AU - Rao, Yuan
AU - Sun, Ling
AU - Yang, Xiong
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/10
Y1 - 2019/10
N2 - News headline classification is a specific example of short text classification, which aims to extract semantic information from the short text and classify it accurately. It can provide a fast classification method for data of various kinds of news media, thus arousing the common concern of academia and industry. Most short text classification methods are based on the semantic expansion of external knowledge, which is unable to expansion dynamically in real time and make full use of label information. To overcome these problems, we propose a novel method which consists of three parts: semantic enhancement, multi-dimensional feature fusion network and multi-level label embedding. Firstly, the word-level semantic information are embedded into the character encoding from pre-Train model to enhance semantic features. Secondly, both of Bi-GRU and multi-scale CNN are used to extract sequence and local features of text to enhance the semantic representation of the sentence. Furthermore, the multi-level label embedding is used to filter textual vector and assist classification in the word and sentence level respectively. Experimental results on NLPCC 2017 Chinese news headline classification task show that our model achieves 84.74% of accuracy and 84.75% of F1, improves over the best baseline model by 1.5% and 1.6%, respectively, and reaches the state-of-The-Art performance.
AB - News headline classification is a specific example of short text classification, which aims to extract semantic information from the short text and classify it accurately. It can provide a fast classification method for data of various kinds of news media, thus arousing the common concern of academia and industry. Most short text classification methods are based on the semantic expansion of external knowledge, which is unable to expansion dynamically in real time and make full use of label information. To overcome these problems, we propose a novel method which consists of three parts: semantic enhancement, multi-dimensional feature fusion network and multi-level label embedding. Firstly, the word-level semantic information are embedded into the character encoding from pre-Train model to enhance semantic features. Secondly, both of Bi-GRU and multi-scale CNN are used to extract sequence and local features of text to enhance the semantic representation of the sentence. Furthermore, the multi-level label embedding is used to filter textual vector and assist classification in the word and sentence level respectively. Experimental results on NLPCC 2017 Chinese news headline classification task show that our model achieves 84.74% of accuracy and 84.75% of F1, improves over the best baseline model by 1.5% and 1.6%, respectively, and reaches the state-of-The-Art performance.
KW - News headlines classification
KW - multi-dimensional feature fusion
KW - multi-level label embedding
KW - semantic enhance
UR - https://www.scopus.com/pages/publications/85083562332
U2 - 10.1109/iSAI-NLP48611.2019.9045404
DO - 10.1109/iSAI-NLP48611.2019.9045404
M3 - 会议稿件
AN - SCOPUS:85083562332
T3 - Proceedings - 2019 14th International Joint Symposium on Artificial Intelligence and Natural Language Processing, iSAI-NLP 2019
BT - Proceedings - 2019 14th International Joint Symposium on Artificial Intelligence and Natural Language Processing, iSAI-NLP 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 14th International Joint Symposium on Artificial Intelligence and Natural Language Processing, iSAI-NLP 2019
Y2 - 30 October 2019 through 1 November 2019
ER -