Tan Chao, Chen Hui, Xia Chengyun
Department of Chemistry and Chemical Engineering, Yibin University, Yibin, Sichuan, 644007, PR China.
J Pharm Biomed Anal. 2009 Apr 5;49(3):746-52. doi: 10.1016/j.jpba.2008.12.010. Epub 2008 Dec 24.
Early detection of cancer is the key to effective treatment and long-term survival. Lung cancer is one of the most frequently occurring cancers and its early detection is particularly of interest. This work investigates the feasibility of a combination of Adaboost (ensemble from machining learning) using decision stumps as weak classifier and trace element analysis for predicting early lung cancer. A dataset involving the determination of 9 trace elements of 122 urine samples is used for illustration. Kennard and Stone (KS) algorithm coupled with an alternate re-sampling was used to realize sample set partitioning. The whole dataset was split into equally sized training and test set, which were then reversed to yield a second operating case, we called them case A and case B, respectively. The prediction results based on the Adaboost were compared with those from Fisher discriminant analysis (FDA). On the test set, the final Adaboost classifiers achieved a sensitivity of 100% for both cases, a specificity of 93.8%, 95.7%, and an overall accuracy of 95.1%, 96.7%, for case A and case B, respectively. In either case, Adaboost always achieves better performance than FDA; also, it is less sensitive to the composition of the training set compared to FDA and easy to control over-fitting. It seems that Adaboost is superior to FDA in the present task, indicating that integrating Adaboost and trace element analysis of urine can serve as a useful tool for diagnosing early lung cancer in clinical practice.
癌症的早期检测是有效治疗和长期生存的关键。肺癌是最常见的癌症之一,其早期检测尤为重要。本研究探讨了将以决策树桩作为弱分类器的Adaboost(机器学习集成方法)与微量元素分析相结合用于预测早期肺癌的可行性。使用一个包含122份尿液样本中9种微量元素测定结果的数据集进行说明。采用Kennard和Stone(KS)算法结合交替重采样来实现样本集划分。将整个数据集等分为训练集和测试集,然后颠倒顺序得到第二种操作情况,我们分别称它们为情况A和情况B。将基于Adaboost的预测结果与Fisher判别分析(FDA)的结果进行比较。在测试集上,最终的Adaboost分类器在两种情况下的灵敏度均达到100%,情况A的特异性为93.8%,情况B的特异性为95.7%,总体准确率分别为95.1%和96.7%。在任何一种情况下,Adaboost的性能总是优于FDA;此外,与FDA相比,它对训练集的组成不太敏感,并且易于控制过拟合。在当前任务中,Adaboost似乎优于FDA,这表明将Adaboost与尿液微量元素分析相结合可作为临床实践中诊断早期肺癌的有用工具。