TY - JOUR
T1 - From big data to knowledge
T2 - A spatio-temporal approach to malware detection
AU - Mao, Weixuan
AU - Cai, Zhongmin
AU - Yang, Yuan
AU - Shi, Xiaohong
AU - Guan, Xiaohong
N1 - Publisher Copyright:
© 2018 Elsevier Ltd
PY - 2018/5
Y1 - 2018/5
N2 - The deployment of endpoint protection has been gradually migrated from individual clients to remote cloud servers, which is termed as cloud based security service. The new paradigm of security defense produces a large amount of data and log files, and motivates data-driven techniques for detecting malicious software. This paper conducts an empirical study on the log of a real cloud based security service to characterize the occurrence of executable files in end hosts, which concerns 124,782 benign and 113,305 malicious executable files occurred in 165,549,417 end hosts. The end hosts and the timestamps that an executable file occurs in provide insights into the distribution of software in wild from spatial and temporal perspectives, respectively. Meanwhile, we investigate the strategies behind the characterizations, and observe the preferential attachment process and the periodicity of file occurrence in end hosts. The observed different occurrence patterns of benign and malicious files in end hosts inspire us a new scalable approach to malware detection. We learn from the characterizations that, the associated files shared more spatial and temporal information in common are more likely to be same in their labels, either benign or malicious. Thus, we devise a graph based semi-supervised learning algorithm for real-time malware detection by taking into account the spatio-temporal information of the distribution of executable files. Experimental results demonstrate that our approach increases the performance on malware detection by 14.7% over previous techniques on average.
AB - The deployment of endpoint protection has been gradually migrated from individual clients to remote cloud servers, which is termed as cloud based security service. The new paradigm of security defense produces a large amount of data and log files, and motivates data-driven techniques for detecting malicious software. This paper conducts an empirical study on the log of a real cloud based security service to characterize the occurrence of executable files in end hosts, which concerns 124,782 benign and 113,305 malicious executable files occurred in 165,549,417 end hosts. The end hosts and the timestamps that an executable file occurs in provide insights into the distribution of software in wild from spatial and temporal perspectives, respectively. Meanwhile, we investigate the strategies behind the characterizations, and observe the preferential attachment process and the periodicity of file occurrence in end hosts. The observed different occurrence patterns of benign and malicious files in end hosts inspire us a new scalable approach to malware detection. We learn from the characterizations that, the associated files shared more spatial and temporal information in common are more likely to be same in their labels, either benign or malicious. Thus, we devise a graph based semi-supervised learning algorithm for real-time malware detection by taking into account the spatio-temporal information of the distribution of executable files. Experimental results demonstrate that our approach increases the performance on malware detection by 14.7% over previous techniques on average.
KW - Content-agnostic
KW - Data-driven security analysis
KW - File co-occurrence
KW - Graph based semi-supervised learning
KW - Malware detection
UR - https://www.scopus.com/pages/publications/85041392648
U2 - 10.1016/j.cose.2017.12.005
DO - 10.1016/j.cose.2017.12.005
M3 - 文章
AN - SCOPUS:85041392648
SN - 0167-4048
VL - 74
SP - 167
EP - 183
JO - Computers and Security
JF - Computers and Security
ER -