• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

机器学习计算模型,使用电子病历预测肺癌。

Machine learning computational model to predict lung cancer using electronic medical records.

机构信息

Adelson School of Medicine, Ariel University, Ariel, Israel.

Department of Mathematics, Ariel University, Ariel, Israel; Department of Cancer Biology, Cancer Institute, University College London, London, UK.

出版信息

Cancer Epidemiol. 2024 Oct;92:102631. doi: 10.1016/j.canep.2024.102631. Epub 2024 Jul 24.

DOI:10.1016/j.canep.2024.102631
PMID:39053365
Abstract

BACKGROUND

Lung cancer (LC) screening using low-dose computed tomography (CT) is recommended according to standard risk criteria or personalized risk calculators. Machine learning (ML) models that can predict disease risk are an emerging method in medicine for identifying hidden associations that are personally unique.

MATERIALS AND METHODS

Using the tree-based pipeline optimization tool (TPOT), we developed an ML-based model, which is an ensemble of the Random Forest and XGboost models, based on known risk factors for LC, as part of a larger trial for ML prediction using electronic medical records and chest CT. We used data from patients with LC vs. controls (1:2) of patients aged ≥ 35 years. We developed a model for all LC patients as well as for patients with and without a smoking background. We included age, gender, body mass index (BMI), smoking history, socioeconomic status (SES), history of chronic obstructive pulmonary disease (COPD)/emphysema/chronic bronchitis (CB), interstitial lung disease (ILD)/pulmonary fibrosis (PF), and family history of LC.

RESULTS

Of the 4076 patients, 1428 (35 %) were in the LC group and 2648 (65 %) were in the control group. For the entire study population, our model achieved an accuracy of 71.2 %, with a sensitivity of 69 % and a positive predictive value (PPV) of 74 %. Higher accuracy was achieved for the two subgroups. An accuracy of 74.8 % (sensitivity 72 %, PPV 76 %) and 73.0 % (sensitivity 76 %, PPV 72 %) was achieved for the smoking and never-smoking cohorts, respectively. For the entire population and smoker cohort, COPD/emphysema/CB were the most important contributors, followed by BMI and age, while in the never-smoking cohort, BMI, age and SES were the most important contributors.

CONCLUSION

Known risk factors for LC could be used in ML models to modestly predict LC. Further studies are needed to confirm these results in new patients and to improve them.

摘要

背景

根据标准风险标准或个性化风险计算器,建议使用低剂量计算机断层扫描(CT)进行肺癌(LC)筛查。机器学习(ML)模型可以预测疾病风险,这是医学中一种新兴的方法,用于识别个人独特的隐藏关联。

材料和方法

使用基于树的管道优化工具(TPOT),我们基于 LC 的已知风险因素开发了一个基于 ML 的模型,该模型是随机森林和 XGboost 模型的集合,作为使用电子病历和胸部 CT 进行 ML 预测的更大试验的一部分。我们使用了 LC 患者与对照(1:2)的患者数据,年龄≥35 岁。我们为所有 LC 患者以及有和没有吸烟背景的患者开发了一个模型。我们包括年龄、性别、体重指数(BMI)、吸烟史、社会经济地位(SES)、慢性阻塞性肺疾病(COPD)/肺气肿/慢性支气管炎(CB)、间质性肺病(ILD)/肺纤维化(PF)和 LC 家族史。

结果

在 4076 名患者中,1428 名(35%)为 LC 组,2648 名(65%)为对照组。对于整个研究人群,我们的模型达到了 71.2%的准确率,敏感性为 69%,阳性预测值(PPV)为 74%。两个亚组的准确率更高。吸烟和从不吸烟队列的准确率分别为 74.8%(敏感性 72%,PPV 76%)和 73.0%(敏感性 76%,PPV 72%)。对于整个人群和吸烟者队列,COPD/肺气肿/CB 是最重要的贡献因素,其次是 BMI 和年龄,而在从不吸烟队列中,BMI、年龄和 SES 是最重要的贡献因素。

结论

LC 的已知风险因素可用于 ML 模型,以适度预测 LC。需要进一步的研究来确认这些结果在新患者中的有效性,并加以改进。

相似文献

1
Machine learning computational model to predict lung cancer using electronic medical records.机器学习计算模型,使用电子病历预测肺癌。
Cancer Epidemiol. 2024 Oct;92:102631. doi: 10.1016/j.canep.2024.102631. Epub 2024 Jul 24.
2
Deep Learning Using Chest Radiographs to Identify High-Risk Smokers for Lung Cancer Screening Computed Tomography: Development and Validation of a Prediction Model.利用胸部X光片进行深度学习以识别肺癌筛查计算机断层扫描的高危吸烟者:预测模型的开发与验证
Ann Intern Med. 2020 Nov 3;173(9):704-713. doi: 10.7326/M20-1868. Epub 2020 Sep 1.
3
Identification of COPD Patients at High Risk for Lung Cancer Mortality Using the COPD-LUCSS-DLCO.使用 COPD-LUCSS-DLCO 识别 COPD 患者肺癌死亡率高危人群。
Chest. 2016 Apr;149(4):936-42. doi: 10.1378/chest.15-1868. Epub 2016 Jan 12.
4
Diagnosis of chronic obstructive pulmonary disease in lung cancer screening Computed Tomography scans: independent contribution of emphysema, air trapping and bronchial wall thickening.肺癌筛查 CT 扫描中慢性阻塞性肺疾病的诊断:肺气肿、空气潴留和支气管壁增厚的独立贡献。
Respir Res. 2013 May 27;14(1):59. doi: 10.1186/1465-9921-14-59.
5
Assessing eligibility for lung cancer screening using parsimonious ensemble machine learning models: A development and validation study.采用简约集成机器学习模型评估肺癌筛查的资格:一项开发和验证研究。
PLoS Med. 2023 Oct 3;20(10):e1004287. doi: 10.1371/journal.pmed.1004287. eCollection 2023 Oct.
6
Low-dose CT screening among never-smokers with or without a family history of lung cancer in Taiwan: a prospective cohort study.台湾不吸烟人群和有肺癌家族史人群的低剂量 CT 筛查:一项前瞻性队列研究。
Lancet Respir Med. 2024 Feb;12(2):141-152. doi: 10.1016/S2213-2600(23)00338-7. Epub 2023 Nov 29.
7
Low positive predictive value of computed tomography screening for lung cancer irrespective of commonly employed definitions of target population.计算机断层扫描筛查肺癌的阳性预测值较低,与目标人群常见定义无关。
Int J Cancer. 2021 Jul 1;149(1):58-65. doi: 10.1002/ijc.33522. Epub 2021 Mar 20.
8
Validation of a Deep Learning-Based Model to Predict Lung Cancer Risk Using Chest Radiographs and Electronic Medical Record Data.基于深度学习的模型使用胸部 X 光片和电子病历数据预测肺癌风险的验证。
JAMA Netw Open. 2022 Dec 1;5(12):e2248793. doi: 10.1001/jamanetworkopen.2022.48793.
9
Identification of patients' smoking status using an explainable AI approach: a Danish electronic health records case study.利用可解释 AI 方法识别患者的吸烟状况:丹麦电子健康记录案例研究。
BMC Med Res Methodol. 2024 May 17;24(1):114. doi: 10.1186/s12874-024-02231-4.
10
Lung cancer risk prediction: Prostate, Lung, Colorectal And Ovarian Cancer Screening Trial models and validation.肺癌风险预测:前列腺癌、肺癌、结直肠癌和卵巢癌筛查试验模型及其验证。
J Natl Cancer Inst. 2011 Jul 6;103(13):1058-68. doi: 10.1093/jnci/djr173. Epub 2011 May 23.

引用本文的文献

1
The tree-based pipeline optimization tool: Tackling biomedical research problems with genetic programming and automated machine learning.基于树的管道优化工具:用遗传编程和自动化机器学习解决生物医学研究问题。
Patterns (N Y). 2025 Jul 11;6(7):101314. doi: 10.1016/j.patter.2025.101314.
2
Molecular Subtypes and Biomarkers of Ulcerative Colitis Revealed by Sphingolipid Metabolism-Related Genes: Insights from Machine Learning and Molecular Dynamics.鞘脂代谢相关基因揭示的溃疡性结肠炎分子亚型和生物标志物:来自机器学习和分子动力学的见解
Curr Issues Mol Biol. 2025 Aug 4;47(8):616. doi: 10.3390/cimb47080616.
3
Artificial intelligence across the cancer care continuum.
贯穿癌症护理全过程的人工智能
Cancer. 2025 Aug 15;131(16):e70050. doi: 10.1002/cncr.70050.
4
A Machine Learning-Based Guide for Repeated Laboratory Testing in Pediatric Emergency Departments.基于机器学习的儿科急诊科重复实验室检测指南
Diagnostics (Basel). 2025 Jul 28;15(15):1885. doi: 10.3390/diagnostics15151885.
5
Machine Learning Unveils Sphingolipid Metabolism's Role in Tumour Microenvironment and Immunotherapy in Lung Cancer.机器学习揭示鞘脂代谢在肺癌肿瘤微环境及免疫治疗中的作用。
J Cell Mol Med. 2025 Apr;29(7):e70435. doi: 10.1111/jcmm.70435.