Suppr超能文献

慢性阻塞性肺疾病基因(COPDGene®)研究中一秒用力呼气容积进展的机器学习预测

Machine Learning Prediction of Progression in Forced Expiratory Volume in 1 Second in the COPDGene® Study.

作者信息

Boueiz Adel, Xu Zhonghui, Chang Yale, Masoomi Aria, Gregory Andrew, Lutz Sharon M, Qiao Dandi, Crapo James D, Dy Jennifer G, Silverman Edwin K, Castaldi Peter J

机构信息

Channing Division of Network Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, United States.

Pulmonary and Critical Care Division, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, United States.

出版信息

Chronic Obstr Pulm Dis. 2022 Jul 29;9(3):349-365. doi: 10.15326/jcopdf.2021.0275.

Abstract

BACKGROUND

The heterogeneous nature of chronic obstructive pulmonary disease (COPD) complicates the identification of the predictors of disease progression. We aimed to improve the prediction of disease progression in COPD by using machine learning and incorporating a rich dataset of phenotypic features.

METHODS

We included 4496 smokers with available data from their enrollment and 5-year follow-up visits in the COPD Genetic Epidemiology (COPDGene) study. We constructed linear regression (LR) and supervised random forest models to predict 5-year progression in forced expiratory in 1 second (FEV) from 46 baseline features. Using cross-validation, we randomly partitioned participants into training and testing samples. We also validated the results in the COPDGene 10-year follow-up visit.

RESULTS

Predicting the change in FEV over time is more challenging than simply predicting the future absolute FEV level. For random forest, R-squared was 0.15 and the area under the receiver operator characteristic (ROC) curves for the prediction of participants in the top quartile of observed progression was 0.71 (testing) and respectively, 0.10 and 0.70 (validation). Random forest provided slightly better performance than LR. The accuracy was best for Global initiative for chronic Obstructive Lung Disease (GOLD) grades 1-2 participants, and it was harder to achieve accurate prediction in advanced stages of the disease. Predictive variables differed in their relative importance as well as for the predictions by GOLD.

CONCLUSION

Random forest, along with deep phenotyping, predicts FEV progression with reasonable accuracy. There is significant room for improvement in future models. This prediction model facilitates the identification of smokers at increased risk for rapid disease progression. Such findings may be useful in the selection of patient populations for targeted clinical trials.

摘要

背景

慢性阻塞性肺疾病(COPD)的异质性使得疾病进展预测指标的识别变得复杂。我们旨在通过使用机器学习并纳入丰富的表型特征数据集来改善COPD疾病进展的预测。

方法

我们纳入了慢性阻塞性肺疾病基因流行病学(COPDGene)研究中4496名有入组数据和5年随访数据的吸烟者。我们构建了线性回归(LR)模型和监督随机森林模型,以根据46个基线特征预测1秒用力呼气容积(FEV)的5年进展情况。通过交叉验证,我们将参与者随机分为训练样本和测试样本。我们还在COPDGene研究的10年随访中验证了结果。

结果

预测FEV随时间的变化比简单预测未来的绝对FEV水平更具挑战性。对于随机森林模型,决定系数R²为0.15,预测观察到的进展处于前四分位数的参与者时,受试者工作特征(ROC)曲线下面积在测试集中为0.71,在验证集中分别为0.10和0.70。随机森林模型的表现略优于LR模型。对于慢性阻塞性肺疾病全球倡议(GOLD)1-2级参与者,预测准确性最佳,而在疾病晚期则更难实现准确预测。预测变量的相对重要性以及按GOLD分级的预测结果各不相同。

结论

随机森林模型结合深度表型分析,能以合理的准确性预测FEV进展情况。未来模型仍有很大改进空间。该预测模型有助于识别疾病快速进展风险增加的吸烟者。这些发现可能有助于选择适合进行靶向临床试验的患者群体。

相似文献

4
Machine Learning and Prediction of All-Cause Mortality in COPD.机器学习与 COPD 全因死亡率预测。
Chest. 2020 Sep;158(3):952-964. doi: 10.1016/j.chest.2020.02.079. Epub 2020 Apr 27.
9

引用本文的文献

本文引用的文献

2
Random forests for high-dimensional longitudinal data.随机森林在高维纵向数据中的应用。
Stat Methods Med Res. 2021 Jan;30(1):166-184. doi: 10.1177/0962280220946080. Epub 2020 Aug 9.
3
Conditional permutation importance revisited.条件排列重要性再探。
BMC Bioinformatics. 2020 Jul 14;21(1):307. doi: 10.1186/s12859-020-03622-2.
7
Prediction models for the development of COPD: a systematic review.慢性阻塞性肺疾病(COPD)发生发展的预测模型:一项系统综述
Int J Chron Obstruct Pulmon Dis. 2018 Jun 14;13:1927-1935. doi: 10.2147/COPD.S155675. eCollection 2018.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验