Pontillo Valeria, Amoroso d'Aragona Dario, Pecorelli Fabiano, Di Nucci Dario, Ferrucci Filomena, Palomba Fabio
Software Engineering (SeSa) Lab - University of Salerno, Fisciano, Italy.
Software Languages (Soft) Lab - Vrije Universiteit Brussel, Brussel, Belgium.
Empir Softw Eng. 2024;29(2):55. doi: 10.1007/s10664-023-10436-2. Epub 2024 Mar 5.
Test smells are symptoms of sub-optimal design choices adopted when developing test cases. Previous studies have proved their harmfulness for test code maintainability and effectiveness. Therefore, researchers have been proposing automated, heuristic-based techniques to detect them. However, the performance of these detectors is still limited and dependent on tunable thresholds. We design and experiment with a novel test smell detection approach based on machine learning to detect four test smells. First, we develop the largest dataset of manually-validated test smells to enable experimentation. Afterward, we train six machine learners and assess their capabilities in within- and cross-project scenarios. Finally, we compare the ML-based approach with state-of-the-art heuristic-based techniques. The key findings of the study report a negative result. The performance of the machine learning-based detector is significantly better than heuristic-based techniques, but none of the learners able to overcome an average F-Measure of 51%. We further elaborate and discuss the reasons behind this negative result through a qualitative investigation into the current issues and challenges that prevent the appropriate detection of test smells, which allowed us to catalog the next steps that the research community may pursue to improve test smell detection techniques.
测试坏味道是在开发测试用例时采用的次优设计选择的症状。先前的研究已经证明了它们对测试代码可维护性和有效性的危害。因此,研究人员一直在提出基于启发式的自动化技术来检测它们。然而,这些检测器的性能仍然有限,并且依赖于可调阈值。我们设计并试验了一种基于机器学习的新型测试坏味道检测方法,以检测四种测试坏味道。首先,我们开发了最大的手动验证测试坏味道数据集,以进行实验。之后,我们训练了六个机器学习器,并评估它们在项目内和跨项目场景中的能力。最后,我们将基于机器学习的方法与基于启发式的最新技术进行比较。该研究的主要发现报告了一个负面结果。基于机器学习的检测器的性能明显优于基于启发式的技术,但没有一个学习器能够克服51%的平均F值。我们通过对当前阻碍测试坏味道正确检测的问题和挑战进行定性调查,进一步阐述并讨论了这个负面结果背后的原因,这使我们能够梳理出研究界为改进测试坏味道检测技术可能采取的下一步措施。