Dept. of Computer Science, Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA 01609, USA.
Artif Intell Med. 2010 Jul;49(3):187-95. doi: 10.1016/j.artmed.2010.04.009. Epub 2010 May 18.
We consider predictive models for clinical performance of pancreatic cancer patients based on machine learning techniques. The predictive performance of machine learning is compared with that of the linear and logistic regression techniques that dominate the medical oncology literature.
We construct predictive models over a clinical database that we have developed for the University of Massachusetts Memorial Hospital in Worcester, Massachusetts, USA. The database contains retrospective records of 91 patient treatments for pancreatic tumors. Classification and regression targets include patient survival time, Eastern Cooperative Oncology Group (ECOG) quality of life scores, surgical outcomes, and tumor characteristics. The predictive performance of several techniques is described, and specific models are presented.
We show that machine learning techniques attain a predictive performance that is as good as, or better than, that of linear and logistic regression, for target attributes that include tumor N and T stage, survival time, and ECOG quality of life scores. Bayesian techniques are found to provide the best performance overall. For tumor size as the target attribute, however, logistic regression (respectively linear regression in the case of a numerical as opposed to discrete target) performs best. Preprocessing in the form of attribute selection and supervised attribute discretization improves predictive performance for most of the predictive techniques and target attributes considered.
Machine learning provides techniques for improved prediction of clinical performance. These techniques therefore merit consideration as valuable alternatives to traditional multivariate regression techniques in clinical medical studies.
我们考虑基于机器学习技术的胰腺癌患者临床性能预测模型。将机器学习的预测性能与主导医学肿瘤学文献的线性和逻辑回归技术进行比较。
我们在美国马萨诸塞州伍斯特市的马萨诸塞大学纪念医院开发的临床数据库上构建预测模型。该数据库包含 91 名胰腺肿瘤患者治疗的回顾性记录。分类和回归目标包括患者生存时间、东部合作肿瘤学组(ECOG)生活质量评分、手术结果和肿瘤特征。描述了几种技术的预测性能,并提出了具体的模型。
我们表明,对于包括肿瘤 N 和 T 分期、生存时间和 ECOG 生活质量评分在内的目标属性,机器学习技术的预测性能与线性和逻辑回归一样好,甚至更好。贝叶斯技术总体上提供了最佳性能。然而,对于肿瘤大小作为目标属性,逻辑回归(对于离散目标,线性回归)表现最佳。属性选择和有监督的属性离散化等预处理形式提高了大多数考虑的预测技术和目标属性的预测性能。
机器学习为临床性能的预测提供了技术。因此,这些技术值得考虑作为传统多元回归技术在临床医学研究中的有价值替代品。