Anderson Billie, Hardin J Michael, Alexander Dominik D, Grizzle William E, Meleth Sreelatha, Manne Upender
SAS Institute, Cary, NC, USA.
Front Biosci (Elite Ed). 2010 Jun 1;2(3):849-56. doi: 10.2741/e146.
Most discoveries of cancer biomarkers involve construction of a single model to determine predictions of survival.. 'Data-mining' techniques, such as artificial neural networks (ANNs), perform better than traditional methods, such as logistic regression. In this study, the quality of multiple predictive models built on a molecular data set for colorectal cancer (CRC) was evaluated. Predictive models (logistic regressions, ANNs, and decision trees) were compared, and the effect of techniques for variable selection on the predictive quality of these models was investigated. The Kolmogorov-Smirnoff (KS) statistic was used to compare the models. Overall, the logistic regression and ANN methods outperformed use of a decision tree. In some instances (e.g., for a model that included 'all variables without tumor stage' and use of a decision tree for variable selection), the ANN marginally outperformed logistic regression, although the difference between the accuracy of the KS statistic was minimal (0.80 versus 0.82). Regardless of the variable(s) and the methods for variable selection, all three predictive models identified survivors and non-survivors with the same level of statistical accuracy.
大多数癌症生物标志物的发现都涉及构建单一模型来确定生存预测。“数据挖掘”技术,如人工神经网络(ANN),比传统方法,如逻辑回归,表现更好。在本研究中,评估了基于结直肠癌(CRC)分子数据集构建的多个预测模型的质量。比较了预测模型(逻辑回归、人工神经网络和决策树),并研究了变量选择技术对这些模型预测质量的影响。使用Kolmogorov-Smirnoff(KS)统计量来比较模型。总体而言,逻辑回归和人工神经网络方法优于决策树的使用。在某些情况下(例如,对于一个包含“所有不包括肿瘤分期的变量”的模型以及使用决策树进行变量选择),人工神经网络略优于逻辑回归,尽管KS统计量准确性之间的差异很小(0.80对0.82)。无论变量和变量选择方法如何,所有三种预测模型识别幸存者和非幸存者的统计准确性水平相同。