Xie Xiaoyuan, Ho Joshua W K, Murphy Christian, Kaiser Gail, Xu Baowen, Chen Tsong Yueh
Centre for Software Analysis and Testing, Swinburne University of Technology, Hawthorn, Vic 3122 Australia.
J Syst Softw. 2011 Apr 1;84(4):544-558. doi: 10.1016/j.jss.2010.11.920.
Machine Learning algorithms have provided core functionality to many application domains - such as bioinformatics, computational linguistics, etc. However, it is difficult to detect faults in such applications because often there is no "test oracle" to verify the correctness of the computed outputs. To help address the software quality, in this paper we present a technique for testing the implementations of machine learning classification algorithms which support such applications. Our approach is based on the technique "metamorphic testing", which has been shown to be effective to alleviate the oracle problem. Also presented include a case study on a real-world machine learning application framework, and a discussion of how programmers implementing machine learning algorithms can avoid the common pitfalls discovered in our study. We also conduct mutation analysis and cross-validation, which reveal that our method has high effectiveness in killing mutants, and that observing expected cross-validation result alone is not sufficiently effective to detect faults in a supervised classification program. The effectiveness of metamorphic testing is further confirmed by the detection of real faults in a popular open-source classification program.
机器学习算法为许多应用领域提供了核心功能,如生物信息学、计算语言学等。然而,在这类应用中检测故障很困难,因为通常没有“测试预言机”来验证计算输出的正确性。为了帮助解决软件质量问题,在本文中我们提出了一种用于测试支持此类应用的机器学习分类算法实现的技术。我们的方法基于“变形测试”技术,该技术已被证明能有效缓解预言机问题。本文还介绍了一个关于实际机器学习应用框架的案例研究,以及对实现机器学习算法的程序员如何避免我们研究中发现的常见陷阱的讨论。我们还进行了变异分析和交叉验证,结果表明我们的方法在杀死变异体方面具有很高的有效性,并且仅观察预期的交叉验证结果不足以有效检测监督分类程序中的故障。在一个流行的开源分类程序中检测到实际故障,进一步证实了变形测试的有效性。