Can We Automatically Fix Bugs by Learning Edit Operations?
“Can We Automatically Fix Bugs by Learning Edit Operations?”
Paper
Summary:
-
Implementing Hephaestus, a novel method to improve the accuracy of APR through learning to apply edit operations. Leverages neural machine translation and attempts to produce the edit operations needed.
Learning edit operations does not offer an advantage over the standard approach of translating directly from buggy code to fixed code. However, interestingly, Hephaestus exhibited lower translation accuracy than the baseline, able to perform successful bug repair.
Points:
- Introduction
- The naive approach attempts some sort of comparison algorithm that identifies the type of bug and replaces it with a prescribed bugs
- Time Consuming
- Learning approach using neural machine translation
- Directly applying the NMT approach to source code is inefficient
- Many bugs fixes involve changes to a few sentences
- Results in suboptimal performances
- Attempt to mitigate the inefficiency by predicting the specific statement on AST
- Attempting on individual tokens would be more optimal
- Many bugs fixes involve changes to a few sentences
- Hephaestus leverages NMT to predict edit operation, derived from Levenshtein Distance Algorithm
- Working at the token level of source code
- Work on any language without language-specific parsers
- The naive approach attempts some sort of comparison algorithm that identifies the type of bug and replaces it with a prescribed bugs
- Related Works
- Tufano et al.
- Repairing code through identification of bug-fix patterns in large software repositories
- Usage of Deep Learning Approach regarding “meaningful” change
- Chen et al.
- Focus on single-line bug
- Jiang et al.
- The correct fix for a given bug does not exist within the model’s output space and the model’s lack of awareness of syntax
- Pre-train model on the programming language in question
- Yuan and Banzhaf
- grouping fine-granularity edits into larger statement-level edits
- Mousavi et al.
- Overfitting and Disparity between predicted bug and fix operation and would mimic a human software developer
- Tufano et al.
- Background
- Fixing buggy code to fixed code using traditional language translation matter of the buggy to fix a variety of language
- Traditional translation replaces the majority of the input sequence which is natural language.
- Fix in code might be minimal
- repair translation should not have the same meaning as the input
- Approach
- Levenshtein Edit Operation
- the bug is input sequence, the NMT model attempts to produce edit operations
- Basic Operations
- Insertion
- Deletion
- Replacement
- Compound Edit Operations
- Group of one or more edit operations; sequence of operations
- Condensing: A grouping process to compound operations
- Dataset Construction
- Control Dataset: baseline, not involved with edit operations
- Machine String: In order to include edit operations, transforming edit operations
- Typed
- General
- We make the distinction between typed and general form to determine if the form of machine string used during training affects the Hephaestus models’ abilities to learn edit operations.
- Experimental Dataset
- translate the bug into its corresponding fix, showing Levenshetein edit distance between the bug and fix
- all basic compound operation sequences which transform the bug into the fix, strict is the minimal sequence of the strict compound operation sequences, and loose is the minimal sequence of the loose compound operation sequences
- Model Construction
- LSTM+General
- GRU+General
- LSTM+Typed
- The CEC ensures that error signals fed forward into the LSTM layers and backpropagated to the LSTM layers are resistant to the effects of the vanishing gradient problem.
- Levenshtein Edit Operation
- Experimental Design
- Perfect Prediction Accuracy
- Failed Prediction Rate
- Edit Distance Decrease
- Training Accuracy
- Result
- PPA: The control model (baseline model) outperformed the rest, with no much difference
- FPR: The control model maintained 100% capability, the string can always be interpreted as a sequence of Java method tokens
- EDD: every model generates “bug fixes” that were further away from the fixed code than the original buggy code
- Training Accuracy: Every model exceeded 90%
- RQ
- RQ1: Is learning edit operations an effective approach to automatic bug repair?
- learning edit operations does not offer advantages over the baseline approach. The experimental Hephaestus models must determine a sequence of edit operations, decode them, and apply them to the inputted buggy method in order to predict fixed source code
- RQ2: What effect does each condensing strategy and machine string form have on the accuracy of bug repair?
- The differences in PPA between the basic, strict, and loose models are negligible, but there are differences according to the training accuracy and average EDD values. Despite having significantly lower final training accuracy, the strict and loose models had slightly more positive EDD values than the basic models (a difference of about 0.96). Thus, it is evidenced that condensing edit operations into strict and loose forms are beneficial over not condensing them at all
- RQ3: What is the effect of using an LSTM-based architecture versus a GRU-based architecture on the accuracy of bug repair?
- the variation is not meaningful enough to consider as a key difference between the models.
- Future Work
- It was determined that most failed predictions were caused by generated indices outside the valid range for a given string. What changes can be made to this model to restrict the prediction range?
- does changing the abstraction method of the training dataset affect this metric?
- Other NLP Tools
- Extra software layers in addition to the methods presented in our study.
- Conclusion
- The introduction of these specific methods for training NMTbased systems to learn bug fixes did not provide a benefit to the task
- Edit operations are capable of performing automated bug repair to some degree
- RQ1: Is learning edit operations an effective approach to automatic bug repair?
Knowledge:
- NMT (Neural Machine Translation): Neural machine translation - Wikipediauses an artificial neural work to predict the likelihood of a sequence of words
- Levenshtein Distance Algorithm: A string metric for measuring the difference between two sequences. This is likely due to the experimental models experiencing higher entropy than the control when making predictions.
Terminology:
- Condensing Strategies:
- Basic Condensing: basic compound operation corresponds with exactly one change
- Loose Condensing: iff the application of its constituent operation is equivalent to the application of some singular op
- Modify a contiguous section of tokens
- Strict Condensing: iff it is loosely compatible and every operation is of the same flavor
- Machine Strings:
- Typed: f is one of ins, del, or rep, depending on if the flavor of the represented edit operation is insertion, deletion, or replacement, respectively
- General: general form machine strings do not explicitly store the flavor of their represented edit operations