Sailer Fabian, Pobiruchin Monika, Bochum Sylvia, Martens Uwe M, Schramm Wendelin
GECKO Institute, Heilbronn University, Germany.
Cancer Center Heilbronn-Franken, SLK Kliniken Heilbronn GmbH, Germany.
Stud Health Technol Inform. 2015;213:75-8.
Survival time prediction at the time of diagnosis is of great importance to make decisions about treatment and long-term follow-up care. However, predicting the outcome of cancer on the basis of clinical information is a challenging task. We now examined the ability of ten different data mining algorithms (Perceptron, Rule Induction, Support Vector Machine, Linear Regression, Naïve Bayes, Decision Tree, k-nearest Neighbor, Logistic Regression, Neural Network, Random Forest) to predict the dichotomous attribute "5-year-survival" based on seven attributes (sex, UICC-stage, etc.) which are available at the time of diagnosis. For this study we made use of the nationwide German research data set on colon cancer provided by the Robert Koch Institute. To assess the results a comparison between data mining algorithms and physicians' opinions was performed. Therefore, physicians guessed the survival time by leveraging the same seven attributes. The average accuracy of the physicians' opinion was 59%, the average accuracy of the machine learning algorithms was 67.7%.
诊断时的生存时间预测对于制定治疗决策和长期随访护理至关重要。然而,基于临床信息预测癌症预后是一项具有挑战性的任务。我们现在研究了十种不同的数据挖掘算法(感知机、规则归纳、支持向量机、线性回归、朴素贝叶斯、决策树、k近邻、逻辑回归、神经网络、随机森林)根据诊断时可用的七个属性(性别、国际癌症控制联盟分期等)预测二分属性“5年生存率”的能力。在本研究中,我们使用了由罗伯特·科赫研究所提供的德国全国结肠癌研究数据集。为了评估结果,我们对数据挖掘算法和医生的意见进行了比较。因此,医生利用相同的七个属性猜测生存时间。医生意见的平均准确率为59%,机器学习算法的平均准确率为67.7%。