TY - GEN
T1 - LMBot
T2 - 17th ACM International Conference on Web Search and Data Mining, WSDM 2024
AU - Cai, Zijian
AU - Tan, Zhaoxuan
AU - Lei, Zhenyu
AU - Zhu, Zifeng
AU - Wang, Hongrui
AU - Zheng, Qinghua
AU - Luo, Minnan
N1 - Publisher Copyright:
© 2024 ACM.
PY - 2024/3/4
Y1 - 2024/3/4
N2 - As malicious actors employ increasingly advanced and widespread bots to disseminate misinformation and manipulate public opinion, the detection of Twitter bots has become a crucial task. Though graph-based Twitter bot detection methods achieve state-of-The-Art performance, we find that their inference depends on the neighbor users multi-hop away from the targets, and fetching neighbors is time-consuming and may introduce sampling bias. At the same time, our experiments reveal that after finetuning on Twitter bot detection task, pretrained language models achieve competitive performance while do not require a graph structure during deployment. Inspired by this finding, we propose a novel bot detection framework LMBot that distills the graph knowledge into language models (LMs) for graph-less deployment in Twitter bot detection to combat data dependency challenge. Moreover, LMBot is compatible with graph-based and graph-less datasets. Specifically, we first represent each user as a textual sequence and feed them into the LM for domain adaptation. For graph-based datasets, the output of LM serves as input features for the GNN, enabling LMBot to optimize for bot detection and distill knowledge back to the LM in an iterative, mutually enhancing process. Armed with the LM, we can perform graph-less inference with graph knowledge, which resolves the graph data dependency and sampling bias issues. For datasets without graph structure, we simply replace the GNN with an MLP, which also shows strong performance. Our experiments demonstrate that LMBot achieves state-of-The-Art performance on four Twitter bot detection benchmarks. Extensive studies also show that LMBot is more robust, versatile, and efficient compared to existing graph-based Twitter bot detection methods.
AB - As malicious actors employ increasingly advanced and widespread bots to disseminate misinformation and manipulate public opinion, the detection of Twitter bots has become a crucial task. Though graph-based Twitter bot detection methods achieve state-of-The-Art performance, we find that their inference depends on the neighbor users multi-hop away from the targets, and fetching neighbors is time-consuming and may introduce sampling bias. At the same time, our experiments reveal that after finetuning on Twitter bot detection task, pretrained language models achieve competitive performance while do not require a graph structure during deployment. Inspired by this finding, we propose a novel bot detection framework LMBot that distills the graph knowledge into language models (LMs) for graph-less deployment in Twitter bot detection to combat data dependency challenge. Moreover, LMBot is compatible with graph-based and graph-less datasets. Specifically, we first represent each user as a textual sequence and feed them into the LM for domain adaptation. For graph-based datasets, the output of LM serves as input features for the GNN, enabling LMBot to optimize for bot detection and distill knowledge back to the LM in an iterative, mutually enhancing process. Armed with the LM, we can perform graph-less inference with graph knowledge, which resolves the graph data dependency and sampling bias issues. For datasets without graph structure, we simply replace the GNN with an MLP, which also shows strong performance. Our experiments demonstrate that LMBot achieves state-of-The-Art performance on four Twitter bot detection benchmarks. Extensive studies also show that LMBot is more robust, versatile, and efficient compared to existing graph-based Twitter bot detection methods.
KW - knowledge distillation
KW - social network analysis
KW - twitter bot detection
UR - https://www.scopus.com/pages/publications/85191745266
U2 - 10.1145/3616855.3635843
DO - 10.1145/3616855.3635843
M3 - 会议稿件
AN - SCOPUS:85191745266
T3 - WSDM 2024 - Proceedings of the 17th ACM International Conference on Web Search and Data Mining
SP - 57
EP - 66
BT - WSDM 2024 - Proceedings of the 17th ACM International Conference on Web Search and Data Mining
PB - Association for Computing Machinery, Inc
Y2 - 4 March 2024 through 8 March 2024
ER -