TY - GEN
T1 - An Unsupervised-Learning Based Method for Detecting Groups of Malicious Web Crawlers in Internet
AU - Yue, Tianyi
AU - Zhou, Yadong
AU - Hu, Bowen
AU - Xu, Zhanbo
AU - Guan, Xiaohong
AU - Zhou, Hao
AU - Liu, Ting
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021/8/23
Y1 - 2021/8/23
N2 - Malicious web crawler has been a serious threat to the security and performance of web servers in Internet. Generally, malicious web crawler systematically obtains massive web pages without approval, and may involve the theft of data assets. In this paper, we propose an unsupervised learning based method for detecting malicious web crawler. The method can be divided into three phases. Firstly, the method generates a representative vector for each client by combining the information of its visiting statistic behaviors and page request stream. Secondly, a new subspace clustering algorithm is developed to cluster the clients into groups. Finally, four metrics are designed to detect the groups of malicious web crawlers. The proposed method is validated based on a real data set consisting of 580 thousand accessing requests. Experimental results show that the proposed method can accurately detect malicious web crawlers with a high TPR (true positive rate) of 91.0% and a low FPR (false positive rate) of 1.3%.
AB - Malicious web crawler has been a serious threat to the security and performance of web servers in Internet. Generally, malicious web crawler systematically obtains massive web pages without approval, and may involve the theft of data assets. In this paper, we propose an unsupervised learning based method for detecting malicious web crawler. The method can be divided into three phases. Firstly, the method generates a representative vector for each client by combining the information of its visiting statistic behaviors and page request stream. Secondly, a new subspace clustering algorithm is developed to cluster the clients into groups. Finally, four metrics are designed to detect the groups of malicious web crawlers. The proposed method is validated based on a real data set consisting of 580 thousand accessing requests. Experimental results show that the proposed method can accurately detect malicious web crawlers with a high TPR (true positive rate) of 91.0% and a low FPR (false positive rate) of 1.3%.
UR - https://www.scopus.com/pages/publications/85116976415
U2 - 10.1109/CASE49439.2021.9551622
DO - 10.1109/CASE49439.2021.9551622
M3 - 会议稿件
AN - SCOPUS:85116976415
T3 - IEEE International Conference on Automation Science and Engineering
SP - 367
EP - 372
BT - 2021 IEEE 17th International Conference on Automation Science and Engineering, CASE 2021
PB - IEEE Computer Society
T2 - 17th IEEE International Conference on Automation Science and Engineering, CASE 2021
Y2 - 23 August 2021 through 27 August 2021
ER -