TY - JOUR
T1 - Faceted Text Segmentation via Multitask Learning
AU - Wu, Bei
AU - Wei, Bifan
AU - Liu, Jun
AU - Wu, Kewei
AU - Wang, Meng
N1 - Publisher Copyright:
© 2012 IEEE.
PY - 2021/9
Y1 - 2021/9
N2 - Text segmentation is a fundamental step in natural language processing (NLP) and information retrieval (IR) tasks. Most existing approaches do not explicitly take into account the facet information of documents for segmentation. Text segmentation and facet annotation are often addressed as separate problems, but they operate in a common input space. This article proposes FTS, which is a novel model for faceted text segmentation via multitask learning (MTL). FTS models faceted text segmentation as an MTL problem with text segmentation and facet annotation. This model employs the bidirectional long short-term memory (Bi-LSTM) network to learn the feature representation of sentences within a document. The feature representation is shared and adjusted with common parameters by MTL, which can help an optimization model to learn a better-shared and robust feature representation from text segmentation to facet annotation. Moreover, the text segmentation is modeled as a sequence tagging task using LSTM with a conditional random fields (CRFs) classification layer. Extensive experiments are conducted on five data sets from five domains: data structure, data mining, computer network, solid mechanics, and crystallography. The results indicate that the FTS model outperforms several highly cited and state-of-the-art approaches related to text segmentation and facet annotation.
AB - Text segmentation is a fundamental step in natural language processing (NLP) and information retrieval (IR) tasks. Most existing approaches do not explicitly take into account the facet information of documents for segmentation. Text segmentation and facet annotation are often addressed as separate problems, but they operate in a common input space. This article proposes FTS, which is a novel model for faceted text segmentation via multitask learning (MTL). FTS models faceted text segmentation as an MTL problem with text segmentation and facet annotation. This model employs the bidirectional long short-term memory (Bi-LSTM) network to learn the feature representation of sentences within a document. The feature representation is shared and adjusted with common parameters by MTL, which can help an optimization model to learn a better-shared and robust feature representation from text segmentation to facet annotation. Moreover, the text segmentation is modeled as a sequence tagging task using LSTM with a conditional random fields (CRFs) classification layer. Extensive experiments are conducted on five data sets from five domains: data structure, data mining, computer network, solid mechanics, and crystallography. The results indicate that the FTS model outperforms several highly cited and state-of-the-art approaches related to text segmentation and facet annotation.
KW - Facet annotation
KW - long short-term memory
KW - multitask learning (MTL)
KW - text segmentation
UR - https://www.scopus.com/pages/publications/85114307704
U2 - 10.1109/TNNLS.2020.3015996
DO - 10.1109/TNNLS.2020.3015996
M3 - 文章
C2 - 32894723
AN - SCOPUS:85114307704
SN - 2162-237X
VL - 32
SP - 3846
EP - 3857
JO - IEEE Transactions on Neural Networks and Learning Systems
JF - IEEE Transactions on Neural Networks and Learning Systems
IS - 9
M1 - 9187579
ER -