Statnikov Alexander, Tsamardinos Ioannis, Dosbayev Yerbolat, Aliferis Constantin F
Discovery Systems Laboratory, Department of Biomedical Informatics, Vanderbilt University, 2209 Garland Avenue, Nashville, TN 37232, USA.
Int J Med Inform. 2005 Aug;74(7-8):491-503. doi: 10.1016/j.ijmedinf.2005.05.002.
The success of treatment of patients with cancer depends on establishing an accurate diagnosis. To this end, we have built a system called GEMS (gene expression model selector) for the automated development and evaluation of high-quality cancer diagnostic models and biomarker discovery from microarray gene expression data. In order to determine and equip the system with the best performing diagnostic methodologies in this domain, we first conducted a comprehensive evaluation of classification algorithms using 11 cancer microarray datasets. In this paper we present a preliminary evaluation of the system with five new datasets. The performance of the models produced automatically by GEMS is comparable or better than the results obtained by human analysts. Additionally, we performed a cross-dataset evaluation of the system. This involved using a dataset to build a diagnostic model and to estimate its future performance, then applying this model and evaluating its performance on a different dataset. We found that models produced by GEMS indeed perform well in independent samples and, furthermore, the cross-validation performance estimates output by the system approximate well the error obtained by the independent validation. GEMS is freely available for download for non-commercial use from http://www.gems-system.org.
癌症患者的治疗成功取决于准确的诊断。为此,我们构建了一个名为GEMS(基因表达模型选择器)的系统,用于从微阵列基因表达数据中自动开发和评估高质量的癌症诊断模型以及发现生物标志物。为了确定并为该系统配备该领域中性能最佳的诊断方法,我们首先使用11个癌症微阵列数据集对分类算法进行了全面评估。在本文中,我们用五个新数据集对该系统进行了初步评估。GEMS自动生成的模型性能与人类分析人员获得的结果相当或更好。此外,我们对该系统进行了跨数据集评估。这包括使用一个数据集构建诊断模型并估计其未来性能,然后应用该模型并在另一个不同的数据集上评估其性能。我们发现,GEMS生成的模型在独立样本中确实表现良好,而且,该系统输出的交叉验证性能估计值与独立验证所获得的误差非常接近。GEMS可从http://www.gems-system.org免费下载供非商业使用。