TY - JOUR
T1 - Clause-aware extractive summarization with topical decoupled contrastive learning
AU - Wang, Peiyuan
AU - Yu, Yajie
AU - Li, Yibao
N1 - Publisher Copyright:
© 2023 Elsevier Ltd
PY - 2024/3
Y1 - 2024/3
N2 - The sentence-level extracted summary is inevitably mixed with redundant information due to the uninformative phrases or detailed expressions mixed in it. The extraction of fine-grained units is dedicated to retain the semantical integrity. To keep the balance between text redundancy and semantical integrity, we propose a novel clause-aware summarization model (TDCL-ClauseSum). We separate complex sentences into grammatically independent but semantically dependent clauses. The clause is regarded as the extraction unit and leverage graph neural network and topical information to capture clause-level relationship. Then a decoupled contrastive loss is stacked over the neural model to fill the gap between topic prediction and clause classification. The experiments of TDCL-ClauseSum are evaluated on two public benchmark datasets CNN/daily mail and New York Times, which contain 310574 and 150536 samples, respectively. Various experiments show that our method achieves remarkable performance on the two datasets (CNN/daily mail:43.94/20.65/40.75, New York Times:49.69/29.84/43.01, in ROUGE-1/ROUGE-2/ROUGE-L). Its promising performance demonstrates that the superiority of clause extraction.
AB - The sentence-level extracted summary is inevitably mixed with redundant information due to the uninformative phrases or detailed expressions mixed in it. The extraction of fine-grained units is dedicated to retain the semantical integrity. To keep the balance between text redundancy and semantical integrity, we propose a novel clause-aware summarization model (TDCL-ClauseSum). We separate complex sentences into grammatically independent but semantically dependent clauses. The clause is regarded as the extraction unit and leverage graph neural network and topical information to capture clause-level relationship. Then a decoupled contrastive loss is stacked over the neural model to fill the gap between topic prediction and clause classification. The experiments of TDCL-ClauseSum are evaluated on two public benchmark datasets CNN/daily mail and New York Times, which contain 310574 and 150536 samples, respectively. Various experiments show that our method achieves remarkable performance on the two datasets (CNN/daily mail:43.94/20.65/40.75, New York Times:49.69/29.84/43.01, in ROUGE-1/ROUGE-2/ROUGE-L). Its promising performance demonstrates that the superiority of clause extraction.
KW - Clause selection
KW - Decoupled contrastive loss
KW - Extractive summarization
KW - Graph neural network
KW - Neural topic model
UR - https://www.scopus.com/pages/publications/85177995274
U2 - 10.1016/j.ipm.2023.103586
DO - 10.1016/j.ipm.2023.103586
M3 - 文章
AN - SCOPUS:85177995274
SN - 0306-4573
VL - 61
JO - Information Processing and Management
JF - Information Processing and Management
IS - 2
M1 - 103586
ER -