Zhou Lihong, Wang Yiwen, Zhu Wenchao, Zhao Yafang, Yu Yihang, Hu Qin, Yu Wenke
Zhejiang Tuberculosis Diagnosis and Treatment Center, Zhejiang Chinese and Western Medicine Integrated Hospital, Hangzhou, Zhejiang, China.
Department of Clinical Medical Engineering, The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China.
Ann Med. 2024 Dec;56(1):2401613. doi: 10.1080/07853890.2024.2401613. Epub 2024 Sep 16.
To evaluate the effectiveness of a machine learning based on computed tomography (CT) radiomics to distinguish nontuberculous mycobacterial pulmonary disease (NTM-PD) from pulmonary tuberculosis (PTB).
In this retrospective analysis, medical records of 99 individuals afflicted with NTM-PD and 285 individuals with PTB in Zhejiang Chinese and Western Medicine Integrated Hospital were examined. Random numbers generated by a computer were utilized to stratify the study cohort, with 80% designated as the training cohort and 20% as the validation cohort. A total of 2153 radiomics features were extracted using Python (Pyradiomics package) to analyse the CT characteristics of the large disease areas. The identification of significant factors was conducted through the least absolute shrinkage and selection operator (LASSO) regression. The following four supervised learning classifier models were developed: random forest (RF), support vector machine (SVM), logistic regression (LR), and extreme gradient boosting (XGBoost). For assessment and comparison of the predictive performance among these models, receiver-operating characteristic (ROC) curves and the areas under the ROC curves (AUCs) were employed.
The Student's -test, Levene test, and LASSO algorithm collectively selected 23 optimal features. ROC analysis was then conducted, with the respective AUC values of the XGBoost, LR, SVM, and RF models recorded to be 1, 0.9044, 0.8868, and 0.7982 in the training cohort. In the validation cohort, the respective AUC values of the XGBoost, LR, SVM, and RF models were 0.8358, 0.8085, 0.87739, and 0.7759. The DeLong test results noted the lack of remarkable variation across the models.
The CT radiomics features can help distinguish between NTM-PD and PTB. Among the four classifiers, SVM showed a stable performance in effectively identifying these two diseases.
评估基于计算机断层扫描(CT)影像组学的机器学习方法区分非结核分枝杆菌肺病(NTM-PD)和肺结核(PTB)的有效性。
在这项回顾性分析中,研究了浙江中西医结合医院99例NTM-PD患者和285例PTB患者的病历。利用计算机生成的随机数对研究队列进行分层,80%作为训练队列,20%作为验证队列。使用Python(Pyradiomics包)提取了总共2153个影像组学特征,以分析大病灶区域的CT特征。通过最小绝对收缩和选择算子(LASSO)回归进行显著因素的识别。开发了以下四种监督学习分类器模型:随机森林(RF)、支持向量机(SVM)、逻辑回归(LR)和极端梯度提升(XGBoost)。为了评估和比较这些模型之间的预测性能,采用了受试者操作特征(ROC)曲线和ROC曲线下面积(AUC)。
t检验、Levene检验和LASSO算法共同选择了23个最佳特征。然后进行ROC分析,训练队列中XGBoost、LR、SVM和RF模型的AUC值分别为1、0.9044、0.8868和0.7982。在验证队列中,XGBoost、LR、SVM和RF模型的AUC值分别为0.8358、0.8085、0.87739和0.7759。DeLong检验结果表明各模型之间无显著差异。
CT影像组学特征有助于区分NTM-PD和PTB。在这四种分类器中,SVM在有效识别这两种疾病方面表现出稳定的性能。