Abstract
We consider the problem of learning to repair erroneous C programs by learning optimal alignments with correct programs. Since the previous approaches fix a single error in a line, it is inevitable to iterate the fixing process until no errors remain. In this work, we propose a novel sequence-to-sequence learning framework for fixing multiple program errors at a time. We introduce the edit-distancebased data labeling approach for program error correction. Instead of labeling a program repair example by pairing an erroneous program with a line fix, we label the example by paring an erroneous program with an optimal alignment to the corresponding correct program produced by the edit-distance computation. We evaluate our proposed approach on a publicly available dataset (DeepFix dataset) that consists of erroneous C programs submitted by novice programming students. On a set of 6,975 erroneous C programs from the Deep- Fix dataset, our approach achieves the stateof- the-art result in terms of full repair rate on the DeepFix dataset (without extra data such as compiler error message or additional source codes for pre-training).
Original language | English |
---|---|
Title of host publication | Findings of the Association for Computational Linguistics, Findings of ACL |
Subtitle of host publication | EMNLP 2021 |
Editors | Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-Tau Yih |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 4850-4855 |
Number of pages | 6 |
ISBN (Electronic) | 9781955917100 |
Publication status | Published - 2021 |
Event | 2021 Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021 - Punta Cana, Dominican Republic Duration: 2021 Nov 7 → 2021 Nov 11 |
Publication series
Name | Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021 |
---|
Conference
Conference | 2021 Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021 |
---|---|
Country/Territory | Dominican Republic |
City | Punta Cana |
Period | 21/11/7 → 21/11/11 |
Bibliographical note
Publisher Copyright:© 2021 Association for Computational Linguistics.
All Science Journal Classification (ASJC) codes
- Language and Linguistics
- Linguistics and Language