Date Published: February 1, 2019
Publisher: Public Library of Science
Author(s): Lei Qiao, Yan Wang, Jie Zhang.
Effort-aware just-in-time (JIT) defect prediction is to rank source code changes based on the likelihood of detects as well as the effort to inspect such changes. Accurate defect prediction algorithms help to find more defects with limited effort. To improve the accuracy of defect prediction, in this paper, we propose a deep learning based approach for effort-aware just-in-time defect prediction. The key idea of the proposed approach is that neural network and deep learning could be exploited to select useful features for defect prediction because they have been proved excellent at selecting useful features for classification and regression. First, we preprocess ten numerical metrics of code changes, and then feed them to a neural network whose output indicates how likely the code change under test contains bugs. Second, we compute the benefit cost ratio for each code change by dividing the likelihood by its size. Finally, we rank code changes according to their benefit cost ratio. Evaluation results on a well-known data set suggest that the proposed approach outperforms the state-of-the-art approaches on each of the subject projects. It improves the average recall and popt by 15.6% and 8.1%, respectively.
Software quality assurance activities, such as the defect prediction and source code inspection, have a great influence on producing high quality reliable software [1, 2]. However, such activities are expensive with the increasing scale of modern software since these software systems are complex and failure-prone. In most cases, the developers have limited resources and insufficient time to test the software . To reduce defects and improve the reliability of such software applications, developers should inspect source code changes thoroughly to identify and fix all defects. However, source code inspection is challenging, labor-intensive, tedious and time consuming. In addition, testing all units for finding bugs is impractical because the software program budgets are finite and the release schedules are tight .
To validate the model, we follow Kamei et al., Yang et al. and Huang et al. to conduct a 10-fold cross-validation technique as data analysis method and divide the data set into training data and testing data. In this section, we present the evaluation of the proposed approach on a publically available data set. The major purpose of the evaluation is to investigate whether the proposed approach can outperform existing approaches. To this end, we compare the proposed approach against the state-of-the-art approaches, i.e., EALR model, LT model and CBS model. Kamei et al.’s EALR model, Yang et al.’s LT model, and Huang et al.’s CBS model are the well-known defect prediction approaches. Such approaches are the just in time (i.e., make defect prediction on the change level) and effort-aware (i.e., taking the effort of code inspection into account) defect prediction approaches that are compared to the proposed approach.
A threat to the construct validity is the suitability of the performance evaluation metrics (recall and popt) of the defect prediction approach. They are employed because they have been widely used in previous effort-aware JIT defect prediction and haven been proved to be effective [13, 18].
In this paper, we propose a deep learning based approach for effort-aware just-in-time defect prediction. Based on sample changes, the approach trains a neural network to predict how likely a given change may contain defects. Given a set of changes to be inspected, the proposed approach ranks such changes according to their risky of defects (predicted by the trained neural network) divided by the required inspection effort (measured by the number of lines modified by the change). The key idea of the proposed approach is that we can exploit neural network and deep learning to select useful features for defect prediction because they have been proved excellent at selecting useful features for classification and regression.