TY - JOUR
T1 - Pride
T2 - Prioritizing Documentation Effort Based on a PageRank-Like Algorithm and Simple Filtering Rules
AU - Pan, Weifeng
AU - Ming, Hua
AU - Kim, Dae Kyoo
AU - Yang, Zijiang
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2023/3/1
Y1 - 2023/3/1
N2 - Code documentation can be helpful in many software quality assurance tasks. However, due to resource constraints (e.g., time, human resources, and budget), programmers often cannot document their work completely and timely. In the literature, two approaches (one is supervised and the other is unsupervised) have been proposed to prioritize documentation effort to ensure the most important classes to be documented first. However, both of them contain several limitations. The supervised approach overly relies on a difficult-to-obtain labeled data set and has high computation cost. The unsupervised one depends on a graph representation of the software structure, which is inaccurate since it neglects many important couplings between classes. In this paper, we propose an improved approach, named Pride, to prioritize documentation effort. First, Pride uses a weighted directed class coupling network to precisely describe classes and their couplings. Second, we propose a PageRank-like algorithm to quantify the importance of classes in the whole class coupling network. Third, we use a set of software metrics to quantify source code complexity and further propose a simple but easy-to-operate filtering rule. Fourth, we sort all the classes according to their importance in descending order and use the filtering rule to filter out unimportant classes. Finally, a threshold kk is utilized, and the top-kk% ranked classes are the identified important classes to be documented first. Empirical results on a set of nine software systems show that, according to the average ranking of the Friedman test, Pride is superior to the existing approaches in the whole data set.
AB - Code documentation can be helpful in many software quality assurance tasks. However, due to resource constraints (e.g., time, human resources, and budget), programmers often cannot document their work completely and timely. In the literature, two approaches (one is supervised and the other is unsupervised) have been proposed to prioritize documentation effort to ensure the most important classes to be documented first. However, both of them contain several limitations. The supervised approach overly relies on a difficult-to-obtain labeled data set and has high computation cost. The unsupervised one depends on a graph representation of the software structure, which is inaccurate since it neglects many important couplings between classes. In this paper, we propose an improved approach, named Pride, to prioritize documentation effort. First, Pride uses a weighted directed class coupling network to precisely describe classes and their couplings. Second, we propose a PageRank-like algorithm to quantify the importance of classes in the whole class coupling network. Third, we use a set of software metrics to quantify source code complexity and further propose a simple but easy-to-operate filtering rule. Fourth, we sort all the classes according to their importance in descending order and use the filtering rule to filter out unimportant classes. Finally, a threshold kk is utilized, and the top-kk% ranked classes are the identified important classes to be documented first. Empirical results on a set of nine software systems show that, according to the average ranking of the Friedman test, Pride is superior to the existing approaches in the whole data set.
KW - Code documentation
KW - PageRank
KW - program comprehension
KW - software maintenance
KW - software metrics
UR - https://www.scopus.com/pages/publications/85129670613
U2 - 10.1109/TSE.2022.3171469
DO - 10.1109/TSE.2022.3171469
M3 - 文章
AN - SCOPUS:85129670613
SN - 0098-5589
VL - 49
SP - 1118
EP - 1151
JO - IEEE Transactions on Software Engineering
JF - IEEE Transactions on Software Engineering
IS - 3
ER -