College of Public Health, Zhengzhou University.
The First Affiliated Hospital of Zhengzhou University.
Eur J Cancer Prev. 2021 Jan;30(1):15-20. doi: 10.1097/CEJ.0000000000000598.
There is a lot of abnormal information in the development of lung cancer, and how to extract useful knowledge is urgent from massive information. Data mining technology has become a popular tool for medical classification and prediction. However, each technology has its advantage and disadvantage, and several data mining methods will be applied to conduct the in-depth analysis step by step. And the prediction results of different models are compared. A total of 180 lung cancer patients and 243 lung benign individuals were collected from the First Affiliated Hospital of Zhengzhou University from October 2014 to March 2016, and the prediction models based on epidemiological data, clinical features and tumor markers were developed by artificial neural network (ANN), decision tree C5.0 and support vector machine (SVM). The results showed that there were significant differences between the lung cancer group and the lung benign group in terms of seven tumor markers and 10 epidemiological and clinical indicators. The accuracy rates of ANN, C5.0 and SVM were 76.47, 89.92 and 85.71%, respectively. The results of receiver operating characteristic curve (ROC) curve revealed the area under the ROC curve (AUC) of ANN was 0.811 (0.770-0.847), the AUC of C5.0 was 0.897 (0.864-0.924) and the AUC of SVM was 0.878 (0.843-0.908). It was shown that the decision tree C5.0 model has the least error rate and highest accuracy, and it could be used to diagnose lung cancer.
肺癌的发展存在大量异常信息,如何从海量信息中提取有用的知识迫在眉睫。数据挖掘技术已成为医学分类和预测的热门工具。然而,每种技术都有其优势和劣势,因此将应用几种数据挖掘方法进行深入分析,逐步进行。并比较不同模型的预测结果。共收集 2014 年 10 月至 2016 年 3 月郑州大学第一附属医院的 180 例肺癌患者和 243 例肺部良性个体,基于流行病学数据、临床特征和肿瘤标志物,采用人工神经网络(ANN)、决策树 C5.0 和支持向量机(SVM)构建预测模型。结果表明,在七种肿瘤标志物和十种流行病学及临床指标方面,肺癌组与肺部良性组之间存在显著差异。ANN、C5.0 和 SVM 的准确率分别为 76.47%、89.92%和 85.71%。受试者工作特征曲线(ROC)的结果表明,ANN 的 ROC 曲线下面积(AUC)为 0.811(0.770-0.847),C5.0 的 AUC 为 0.897(0.864-0.924),SVM 的 AUC 为 0.878(0.843-0.908)。结果表明,决策树 C5.0 模型误差率最小,准确率最高,可用于诊断肺癌。