Abstract
Software vulnerabilities are now reported unprecedentedly due to the recent development of automated vulnerability hunting tools. However, fixing vulnerabilities still mainly depends on programmers' manual efforts. Developers need to deeply understand the vulnerability and affect the system's functions as little as possible. In this paper, with the advancement of Neural Machine Translation (NMT) techniques, we provide a novel approach called SeqTrans to exploit historical vulnerability fixes to provide suggestions and automatically fix the source code. To capture the contextual information around the vulnerable code, we propose to leverage data-flow dependencies to construct code sequences and feed them into the state-of-the-art transformer model. The fine-tuning strategy has been introduced to overcome the small sample size problem. We evaluate SeqTrans on a dataset containing 1,282 commits that fix 624 CVEs in 205 Java projects. Results show that the accuracy of SeqTrans outperforms the latest techniques and achieves 23.3% in statement-level fix and 25.3% in CVE-level fix. In the meantime, we look deep inside the result and observe that the NMT model performs very well in certain kinds of vulnerabilities like CWE-287 (Improper Authentication) and CWE-863 (Incorrect Authorization).
| Original language | English |
|---|---|
| Pages (from-to) | 564-585 |
| Number of pages | 22 |
| Journal | IEEE Transactions on Software Engineering |
| Volume | 49 |
| Issue number | 2 |
| DOIs | |
| State | Published - 1 Feb 2023 |
Keywords
- Machine learning
- neural machine translation
- software engineering
- vulnerability fix
Fingerprint
Dive into the research topics of 'SeqTrans: Automatic Vulnerability Fix Via Sequence to Sequence Learning'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver