基于图的机器学习提高了即时缺陷预测的效果。

Graph-based machine learning improves just-in-time defect prediction.

机构信息

AT&T Cybersecurity, AT&T, Atlanta, GA, United States of America.

Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, TN, United States of America.

出版信息

PLoS One. 2023 Apr 13;18(4):e0284077. doi: 10.1371/journal.pone.0284077. eCollection 2023.

DOI:10.1371/journal.pone.0284077

PMID:37053155

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10101485/

Abstract

The increasing complexity of today's software requires the contribution of thousands of developers. This complex collaboration structure makes developers more likely to introduce defect-prone changes that lead to software faults. Determining when these defect-prone changes are introduced has proven challenging, and using traditional machine learning (ML) methods to make these determinations seems to have reached a plateau. In this work, we build contribution graphs consisting of developers and source files to capture the nuanced complexity of changes required to build software. By leveraging these contribution graphs, our research shows the potential of using graph-based ML to improve Just-In-Time (JIT) defect prediction. We hypothesize that features extracted from the contribution graphs may be better predictors of defect-prone changes than intrinsic features derived from software characteristics. We corroborate our hypothesis using graph-based ML for classifying edges that represent defect-prone changes. This new framing of the JIT defect prediction problem leads to remarkably better results. We test our approach on 14 open-source projects and show that our best model can predict whether or not a code change will lead to a defect with an F1 score as high as 77.55% and a Matthews correlation coefficient (MCC) as high as 53.16%. This represents a 152% higher F1 score and a 3% higher MCC over the state-of-the-art JIT defect prediction. We describe limitations, open challenges, and how this method can be used for operational JIT defect prediction.

摘要

当今软件的日益复杂性需要成千上万的开发人员的贡献。这种复杂的协作结构使得开发人员更有可能引入易出错的变更，从而导致软件故障。确定何时引入这些易出错的变更具有挑战性，并且使用传统的机器学习 (ML) 方法似乎已经达到了一个瓶颈。在这项工作中，我们构建了由开发人员和源文件组成的贡献图，以捕捉构建软件所需的细微变更复杂性。通过利用这些贡献图，我们的研究表明，使用基于图的 ML 来改进即时 (JIT) 缺陷预测具有潜力。我们假设从贡献图中提取的特征可能比从软件特征中得出的固有特征更能预测易出错的变更。我们使用基于图的 ML 对表示易出错变更的边进行分类，从而验证了我们的假设。这个 JIT 缺陷预测问题的新框架带来了显著更好的结果。我们在 14 个开源项目上测试了我们的方法，并表明我们的最佳模型可以预测代码变更是否会导致缺陷，其 F1 得分高达 77.55%，马修斯相关系数 (MCC) 高达 53.16%。这代表着 F1 得分提高了 152%，MCC 提高了 3%，超过了最先进的 JIT 缺陷预测。我们描述了局限性、开放挑战，以及如何将这种方法用于操作 JIT 缺陷预测。