TY - GEN
T1 - PyAnalyzer
T2 - 44th ACM/IEEE International Conference on Software Engineering, ICSE 2024
AU - Jin, Wuxia
AU - Xu, Shuo
AU - Chen, Dawei
AU - He, Jiajun
AU - Zhong, Dinghong
AU - Fan, Ming
AU - Chen, Hongxu
AU - Zhang, Huijia
AU - Liu, Ting
N1 - Publisher Copyright:
© 2024 ACM.
PY - 2024/5/20
Y1 - 2024/5/20
N2 - Dependency extraction based on static analysis lays the ground-work for a wide range of applications. However, dynamic language features in Python make code behaviors obscure and nondeter-ministic; consequently, it poses huge challenges for static analyses to resolve symbol-level dependencies. Although prosperous techniques and tools are adequately available, they still lack sufficient capabilities to handle object changes, first-class citizens, varying call sites, and library dependencies. To address the fundamental difficulty for dynamic languages, this work proposes an effective and practical method namely PyAnalyzer for dependency extraction. PyAnalyzer uniformly models functions, classes, and modules into first-class heap objects, propagating the dynamic changes of these objects and class inheritance. This manner better simulates dynamic features like duck typing, object changes, and first-class citizens, resulting in high recall results without compromising pre-cision. Moreover, PyAnalyzer leverages optional type annotations as a shortcut to express varying call sites and resolve library depen-dencies on demand. We collected two micro-benchmarks (278 small programs), two macro-benchmarks (59 real-world applications), and 191 real-world projects (10MSLOC) for comprehensive comparisons with 7 advanced techniques (i.e., Understand, Sourcetrail, Depends, ENRE19, PySonar2, PyCG, and Type4Py). The results demonstrated that PyAnalyzer achieves a high recall and hence improves the F1 by 24.7% on average, at least 1.4x faster without an obvious compromise of memory efficiency. Our work will benefit diverse client applications.
AB - Dependency extraction based on static analysis lays the ground-work for a wide range of applications. However, dynamic language features in Python make code behaviors obscure and nondeter-ministic; consequently, it poses huge challenges for static analyses to resolve symbol-level dependencies. Although prosperous techniques and tools are adequately available, they still lack sufficient capabilities to handle object changes, first-class citizens, varying call sites, and library dependencies. To address the fundamental difficulty for dynamic languages, this work proposes an effective and practical method namely PyAnalyzer for dependency extraction. PyAnalyzer uniformly models functions, classes, and modules into first-class heap objects, propagating the dynamic changes of these objects and class inheritance. This manner better simulates dynamic features like duck typing, object changes, and first-class citizens, resulting in high recall results without compromising pre-cision. Moreover, PyAnalyzer leverages optional type annotations as a shortcut to express varying call sites and resolve library depen-dencies on demand. We collected two micro-benchmarks (278 small programs), two macro-benchmarks (59 real-world applications), and 191 real-world projects (10MSLOC) for comprehensive comparisons with 7 advanced techniques (i.e., Understand, Sourcetrail, Depends, ENRE19, PySonar2, PyCG, and Type4Py). The results demonstrated that PyAnalyzer achieves a high recall and hence improves the F1 by 24.7% on average, at least 1.4x faster without an obvious compromise of memory efficiency. Our work will benefit diverse client applications.
KW - Dependency Extraction
KW - Dynamic Features
KW - Python
UR - https://www.scopus.com/pages/publications/85196770077
U2 - 10.1145/3597503.3640325
DO - 10.1145/3597503.3640325
M3 - 会议稿件
AN - SCOPUS:85196770077
T3 - Proceedings - International Conference on Software Engineering
SP - 1372
EP - 1383
BT - Proceedings - 2024 ACM/IEEE 44th International Conference on Software Engineering, ICSE 2024
PB - IEEE Computer Society
Y2 - 14 April 2024 through 20 April 2024
ER -