Skip to main navigation Skip to search Skip to main content

TEDM-PU: A Tax Evasion Detection Method Based on Positive and Unlabeled Learning

  • Yingchao Wu
  • , Qinghua Zheng
  • , Yuda Gao
  • , Bo Dong
  • , Rongzhe Wei
  • , Fa Zhang
  • , Huan He
  • Xi'an Jiaotong University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

12 Scopus citations

Abstract

Tax evasion detection plays a crucial role in reducing tax revenue loss and many efforts have been made to develop detection models based on machine learning techniques. To train an effective model to detect tax evaders, a large amount of data is required, especially sufficient labeled data. However, the expensive and time-consuming annotation process results in small amount of labeled data being available, which makes the development of detection models difficult. To address this issue, we propose a tax evasion detection method based on positive and unlabeled learning (TEDM-PU), to identify tax evasion by utilizing limited annotated tax evasion taxpayers and a large amount of unlabeled data. The TEDM-PU framework consists of three stages: a preprocessing stage extracting taxpayer features based on random forest, a pseudo labeling stage assigning pseudo labels to unlabeled samples based on PUAdapter, and a model training stage based on LightGBM method. To evaluate the effectiveness of our proposed TEDM-PU, we conduct experimental tests on real-world tax data. The results demonstrate that TEDM-PU method can detect tax evaders with higher accuracy and better interpretability than state-of-the-art methods.

Original languageEnglish
Title of host publicationProceedings - 2019 IEEE International Conference on Big Data, Big Data 2019
EditorsChaitanya Baru, Jun Huan, Latifur Khan, Xiaohua Tony Hu, Ronay Ak, Yuanyuan Tian, Roger Barga, Carlo Zaniolo, Kisung Lee, Yanfang Fanny Ye
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1681-1686
Number of pages6
ISBN (Electronic)9781728108582
DOIs
StatePublished - Dec 2019
Event2019 IEEE International Conference on Big Data, Big Data 2019 - Los Angeles, United States
Duration: 9 Dec 201912 Dec 2019

Publication series

NameProceedings - 2019 IEEE International Conference on Big Data, Big Data 2019

Conference

Conference2019 IEEE International Conference on Big Data, Big Data 2019
Country/TerritoryUnited States
CityLos Angeles
Period9/12/1912/12/19

Keywords

  • PU learning
  • interpretability
  • tax evasion detection

Fingerprint

Dive into the research topics of 'TEDM-PU: A Tax Evasion Detection Method Based on Positive and Unlabeled Learning'. Together they form a unique fingerprint.

Cite this