Chen Tzu-Tao, Cheng Tzu-Yu, Liu I-Jung, Ho Shu-Chuan, Lee Kang-Yun, Huang Huei-Tyng, Feng Po-Hao, Chen Kuan-Yuan, Luo Ching-Shan, Tseng Chien-Hua, Chen Yueh-His, Majumdar Arnab, Tsai Cheng-Yu, Wu Sheng-Ming
Division of Pulmonary Medicine, Department of Internal Medicine, Shuang Ho Hospital, Taipei Medical University, New Taipei City 23561, Taiwan.
Division of Pulmonary Medicine, Department of Internal Medicine, School of Medicine, College of Medicine, Taipei Medical University, Taipei 11031, Taiwan.
Diagnostics (Basel). 2025 May 3;15(9):1165. doi: 10.3390/diagnostics15091165.
Chronic obstructive pulmonary disease (COPD) remains a leading cause of death worldwide, with emphysema progression providing valuable insights into disease development. Clinical assessment approaches, including pulmonary function tests and high-resolution computed tomography, are limited by accessibility constraints and radiation exposure. This study, therefore, proposed an alternative approach by integrating the novel biomarker long non-coding interleukin-7 receptor α-subunit gene (), along with other easily accessible clinical and biochemical metrics, into machine learning (ML) models. This cohort study collected baseline characteristics, COPD Assessment Test (CAT) scores, and biochemical details from the enrolled participants. Associations with emphysema severity, defined by a low attenuation area percentage (LAA%) threshold of 15%, were evaluated using simple and multivariate-adjusted models. The dataset was then split into training and validation (80%) and test (20%) subsets. Five ML models were employed, with the best-performing model being further analyzed for feature importance. The majority of participants were elderly males. Compared to the LAA% <15% group, the LAA% ≥15% group demonstrated a significantly higher body mass index (BMI), poor pulmonary function, and lower expression levels of (all < 0.01). Fold changes in were strongly and negatively associated with LAA% ( < 0.01). The random forest (RF) model achieved the highest accuracy and area under the receiver operating characteristic curve (AUROC) across datasets. A feature importance analysis identified fold changes as the strongest predictor for emphysema classification (LAA% ≥15%), followed by CAT scores and BMI. Machine learning models incorporated accessible clinical and biochemical markers, particularly the novel biomarker , achieving classification accuracy and AUROC exceeding 75% in emphysema assessments. These findings offer promising opportunities for improving emphysema classification and COPD management.
慢性阻塞性肺疾病(COPD)仍是全球主要死因,肺气肿进展为疾病发展提供了有价值的见解。包括肺功能测试和高分辨率计算机断层扫描在内的临床评估方法受到可及性限制和辐射暴露的制约。因此,本研究提出了一种替代方法,即将新型生物标志物长链非编码白细胞介素-7受体α亚基基因()与其他易于获取的临床和生化指标整合到机器学习(ML)模型中。这项队列研究收集了入组参与者的基线特征、慢性阻塞性肺疾病评估测试(CAT)分数和生化细节。使用简单模型和多变量调整模型评估与肺气肿严重程度的关联,肺气肿严重程度由15%的低衰减面积百分比(LAA%)阈值定义。然后将数据集分为训练集和验证集(80%)以及测试集(20%)。采用了五种机器学习模型,并对表现最佳的模型进行了特征重要性的进一步分析。大多数参与者为老年男性。与LAA%<15%组相比,LAA%≥15%组的体重指数(BMI)显著更高、肺功能较差且的表达水平较低(所有均<0.01)。的倍数变化与LAA%呈强烈负相关(<0.01)。随机森林(RF)模型在各数据集中实现了最高的准确率和受试者工作特征曲线下面积(AUROC)。特征重要性分析确定倍数变化是肺气肿分类(LAA%≥15%)的最强预测因子,其次是CAT分数和BMI。机器学习模型纳入了易于获取的临床和生化标志物,特别是新型生物标志物,在肺气肿评估中实现了超过75%的分类准确率和AUROC。这些发现为改善肺气肿分类和慢性阻塞性肺疾病管理提供了有前景的机会。