Suppr超能文献

选择最佳的软件代码描述符——以 Java 为例。

Selecting optimal software code descriptors-The case of Java.

机构信息

Huawei, Moscow, Russia.

Innopolis University, Innopolis, Russia.

出版信息

PLoS One. 2024 Nov 1;19(11):e0310840. doi: 10.1371/journal.pone.0310840. eCollection 2024.

Abstract

Over the last 25 years, a considerable proliferation of software metrics and a plethora of tools have emerged to extract them. While this is indeed positive concerning the previous situations of limited data, it still leads to a significant problem arising both from a theoretical and a practical standpoint. From a theoretical perspective, several metrics are likely to result in collinearity, overfitting, etc. From a practical perspective, such a set of metrics is difficult to manage and companies, especially small ones, may feel overwhelmed and unable to select a viable subset of them. Still, so far it has not been fully understood what is a viable subset of metrics suitable to properly manage software projects and products. In this paper, we attempt to address this issue. We focus on the case of programs written in Java and we consider classes and methods. We use Sammon error as a measure of the similarity of metrics. Utilizing both Particle Swarm Optimization and Genetic Algorithm, we adapted a method for the identification of a viable subset of such metrics that could solve the mentioned problem. Furthermore, we experiment with our approach on 800 projects coming from GitHub and validate the results on 200 projects. With the proposed method we got optimal subsets of software engineering metrics. These subsets gave us low values of Sammon error at more than 70% at class and method levels on a validation dataset.

摘要

在过去的 25 年中,已经出现了相当多的软件度量标准和工具来提取它们。虽然这确实是针对以前数据有限的情况而言是积极的,但它仍然会导致从理论和实践两个方面都出现重大问题。从理论的角度来看,几种度量标准可能会导致共线性、过拟合等问题。从实践的角度来看,这样一组度量标准很难管理,尤其是对于小型公司来说,它们可能会感到不知所措,无法选择可行的度量标准子集。然而,到目前为止,还不完全清楚什么是适合正确管理软件项目和产品的可行度量标准子集。在本文中,我们尝试解决这个问题。我们专注于用 Java 编写的程序的情况,并考虑类和方法。我们使用 Sammon 误差作为度量标准相似度的指标。我们利用粒子群优化和遗传算法,为这种度量标准的可行子集识别开发了一种方法,可以解决上述问题。此外,我们还在 800 个来自 GitHub 的项目上进行了实验,并在 200 个项目上验证了结果。通过我们提出的方法,我们得到了软件工程度量标准的最优子集。这些子集在验证数据集中,在类和方法级别上,Sammon 误差的数值低于 70%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/13cd/11530023/0984e67a6db1/pone.0310840.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验