LMBot: Distilling Graph Knowledge into Language Model for Graph-less Deployment in Twitter Bot Detection

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

19 Scopus citations

Abstract

As malicious actors employ increasingly advanced and widespread bots to disseminate misinformation and manipulate public opinion, the detection of Twitter bots has become a crucial task. Though graph-based Twitter bot detection methods achieve state-of-The-Art performance, we find that their inference depends on the neighbor users multi-hop away from the targets, and fetching neighbors is time-consuming and may introduce sampling bias. At the same time, our experiments reveal that after finetuning on Twitter bot detection task, pretrained language models achieve competitive performance while do not require a graph structure during deployment. Inspired by this finding, we propose a novel bot detection framework LMBot that distills the graph knowledge into language models (LMs) for graph-less deployment in Twitter bot detection to combat data dependency challenge. Moreover, LMBot is compatible with graph-based and graph-less datasets. Specifically, we first represent each user as a textual sequence and feed them into the LM for domain adaptation. For graph-based datasets, the output of LM serves as input features for the GNN, enabling LMBot to optimize for bot detection and distill knowledge back to the LM in an iterative, mutually enhancing process. Armed with the LM, we can perform graph-less inference with graph knowledge, which resolves the graph data dependency and sampling bias issues. For datasets without graph structure, we simply replace the GNN with an MLP, which also shows strong performance. Our experiments demonstrate that LMBot achieves state-of-The-Art performance on four Twitter bot detection benchmarks. Extensive studies also show that LMBot is more robust, versatile, and efficient compared to existing graph-based Twitter bot detection methods.

Original languageEnglish
Title of host publicationWSDM 2024 - Proceedings of the 17th ACM International Conference on Web Search and Data Mining
PublisherAssociation for Computing Machinery, Inc
Pages57-66
Number of pages10
ISBN (Electronic)9798400703713
DOIs
StatePublished - 4 Mar 2024
Event17th ACM International Conference on Web Search and Data Mining, WSDM 2024 - Merida, Mexico
Duration: 4 Mar 20248 Mar 2024

Publication series

NameWSDM 2024 - Proceedings of the 17th ACM International Conference on Web Search and Data Mining

Conference

Conference17th ACM International Conference on Web Search and Data Mining, WSDM 2024
Country/TerritoryMexico
CityMerida
Period4/03/248/03/24

Keywords

  • knowledge distillation
  • social network analysis
  • twitter bot detection

Fingerprint

Dive into the research topics of 'LMBot: Distilling Graph Knowledge into Language Model for Graph-less Deployment in Twitter Bot Detection'. Together they form a unique fingerprint.

Cite this