使用自动整理的电子健康记录数据（Pythia）开发和验证机器学习模型以识别高风险手术患者：一项回顾性、单站点研究。

Development and validation of machine learning models to identify high-risk surgical patients using automatically curated electronic health record data (Pythia): A retrospective, single-site study.

机构信息

Duke Institute for Health Innovation, Durham, North Carolina, United States of America.

Department of Statistical Sciences, Duke University, Durham, North Carolina, United States of America.

出版信息

PLoS Med. 2018 Nov 27;15(11):e1002701. doi: 10.1371/journal.pmed.1002701. eCollection 2018 Nov.

DOI:10.1371/journal.pmed.1002701

PMID:30481172

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6258507/

Abstract

BACKGROUND

Pythia is an automated, clinically curated surgical data pipeline and repository housing all surgical patient electronic health record (EHR) data from a large, quaternary, multisite health institute for data science initiatives. In an effort to better identify high-risk surgical patients from complex data, a machine learning project trained on Pythia was built to predict postoperative complication risk.

METHODS AND FINDINGS

A curated data repository of surgical outcomes was created using automated SQL and R code that extracted and processed patient clinical and surgical data across 37 million clinical encounters from the EHRs. A total of 194 clinical features including patient demographics (e.g., age, sex, race), smoking status, medications, comorbidities, procedure information, and proxies for surgical complexity were constructed and aggregated. A cohort of 66,370 patients that had undergone 99,755 invasive procedural encounters between January 1, 2014, and January 31, 2017, was studied further for the purpose of predicting postoperative complications. The average complication and 30-day postoperative mortality rates of this cohort were 16.0% and 0.51%, respectively. Least absolute shrinkage and selection operator (lasso) penalized logistic regression, random forest models, and extreme gradient boosted decision trees were trained on this surgical cohort with cross-validation on 14 specific postoperative outcome groupings. Resulting models had area under the receiver operator characteristic curve (AUC) values ranging between 0.747 and 0.924, calculated on an out-of-sample test set from the last 5 months of data. Lasso penalized regression was identified as a high-performing model, providing clinically interpretable actionable insights. Highest and lowest performing lasso models predicted postoperative shock and genitourinary outcomes with AUCs of 0.924 (95% CI: 0.901, 0.946) and 0.780 (95% CI: 0.752, 0.810), respectively. A calculator requiring input of 9 data fields was created to produce a risk assessment for the 14 groupings of postoperative outcomes. A high-risk threshold (15% risk of any complication) was determined to identify high-risk surgical patients. The model sensitivity was 76%, with a specificity of 76%. Compared to heuristics that identify high-risk patients developed by clinical experts and the ACS NSQIP calculator, this tool performed superiorly, providing an improved approach for clinicians to estimate postoperative risk for patients. Limitations of this study include the missingness of data that were removed for analysis.

CONCLUSIONS

Extracting and curating a large, local institution's EHR data for machine learning purposes resulted in models with strong predictive performance. These models can be used in clinical settings as decision support tools for identification of high-risk patients as well as patient evaluation and care management. Further work is necessary to evaluate the impact of the Pythia risk calculator within the clinical workflow on postoperative outcomes and to optimize this data flow for future machine learning efforts.

摘要

背景

Pythia 是一个自动化的、经过临床审核的手术数据管道和存储库，其中包含来自大型四等级多地点健康研究所的所有外科患者电子健康记录 (EHR) 数据，用于数据科学计划。为了更好地从复杂数据中识别高风险手术患者，我们构建了一个基于 Pythia 的机器学习项目来预测术后并发症风险。

方法和发现

我们使用自动化的 SQL 和 R 代码创建了一个经过审核的手术结果数据存储库，该代码从 EHR 中提取和处理了 3700 万次临床就诊中的患者临床和手术数据。共构建并汇总了 194 个临床特征，包括患者人口统计学信息（如年龄、性别、种族）、吸烟状况、药物使用情况、合并症、手术信息以及手术复杂性的替代指标。我们进一步研究了一个包含 66370 名患者的队列，这些患者在 2014 年 1 月 1 日至 2017 年 1 月 31 日期间进行了 99755 次有创手术。该队列用于预测术后并发症。该队列的平均并发症发生率和 30 天术后死亡率分别为 16.0%和 0.51%。最小绝对收缩和选择算子（lasso）惩罚逻辑回归、随机森林模型和极端梯度增强决策树在这个手术队列上进行了训练，并在 14 个特定的术后结局分组上进行了交叉验证。结果模型的受试者工作特征曲线下面积（AUC）值在 0.747 到 0.924 之间，在最后 5 个月的数据的外部测试集中计算得出。lasso 惩罚回归被确定为一种表现良好的模型，提供了可临床解释的可操作见解。性能最高和最低的 lasso 模型分别预测术后休克和泌尿生殖结局的 AUC 值为 0.924（95%CI：0.901，0.946）和 0.780（95%CI：0.752，0.810）。创建了一个需要输入 9 个数据字段的计算器，以对 14 组术后结局进行风险评估。确定了一个高风险阈值（任何并发症的 15%风险）来识别高风险手术患者。该模型的灵敏度为 76%，特异性为 76%。与由临床专家和 ACS NSQIP 计算器开发的识别高风险患者的启发式方法相比，该工具表现更好，为临床医生提供了一种更好的方法来估计患者的术后风险。本研究的局限性包括为分析而删除的数据缺失。

结论

为机器学习目的提取和审核大型本地机构的 EHR 数据可产生具有强大预测性能的模型。这些模型可在临床环境中用作决策支持工具，以识别高风险患者以及患者评估和护理管理。需要进一步工作来评估 Pythia 风险计算器在术后结局和优化未来机器学习工作的临床工作流程中的影响。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c561/6258507/41b6cb05c1f4/pmed.1002701.g001.jpg

相似文献

Development and validation of machine learning models to identify high-risk surgical patients using automatically curated electronic health record data (Pythia): A retrospective, single-site study.

PLoS Med. 2018 Nov 27;15(11):e1002701. doi: 10.1371/journal.pmed.1002701. eCollection 2018 Nov.

Can Predictive Modeling Tools Identify Patients at High Risk of Prolonged Opioid Use After ACL Reconstruction?

Clin Orthop Relat Res. 2020 Jul;478(7):0-1618. doi: 10.1097/CORR.0000000000001251.

A Tool to Estimate Risk of 30-day Mortality and Complications After Hip Fracture Surgery: Accurate Enough for Some but Not All Purposes? A Study From the ACS-NSQIP Database.

Clin Orthop Relat Res. 2022 Dec 1;480(12):2335-2346. doi: 10.1097/CORR.0000000000002294. Epub 2022 Jun 27.

Performance of a Machine Learning Algorithm Using Electronic Health Record Data to Predict Postoperative Complications and Report on a Mobile Platform.

JAMA Netw Open. 2022 May 2;5(5):e2211973. doi: 10.1001/jamanetworkopen.2022.11973.

Machine-learning Models Predict 30-Day Mortality, Cardiovascular Complications, and Respiratory Complications After Aseptic Revision Total Joint Arthroplasty.

Clin Orthop Relat Res. 2022 Nov 1;480(11):2137-2145. doi: 10.1097/CORR.0000000000002276. Epub 2022 Jun 20.

Can Machine-learning Algorithms Predict Early Revision TKA in the Danish Knee Arthroplasty Registry?

Clin Orthop Relat Res. 2020 Sep;478(9):2088-2101. doi: 10.1097/CORR.0000000000001343.

Predicting the risk of emergency admission with machine learning: Development and validation using linked electronic health records.

PLoS Med. 2018 Nov 20;15(11):e1002695. doi: 10.1371/journal.pmed.1002695. eCollection 2018 Nov.

Development and Validation of a Machine Learning Model to Identify Patients Before Surgery at High Risk for Postoperative Adverse Events.

JAMA Netw Open. 2023 Jul 3;6(7):e2322285. doi: 10.1001/jamanetworkopen.2023.22285.

Prediction of In-hospital Mortality in Emergency Department Patients With Sepsis: A Local Big Data-Driven, Machine Learning Approach.

Acad Emerg Med. 2016 Mar;23(3):269-78. doi: 10.1111/acem.12876. Epub 2016 Feb 13.

Development and Validation of an Explainable Machine Learning Model for Predicting Myocardial Injury After Noncardiac Surgery in Two Centers in China: Retrospective Study.

JMIR Aging. 2024 Jul 26;7:e54872. doi: 10.2196/54872.

引用本文的文献

AI-delirium guard: Predictive modeling of postoperative delirium in elderly surgical patients.

PLoS One. 2025 Jun 5;20(6):e0322032. doi: 10.1371/journal.pone.0322032. eCollection 2025.

Predicting 30-Day Postoperative Mortality and American Society of Anesthesiologists Physical Status Using Retrieval-Augmented Large Language Models: Development and Validation Study.

J Med Internet Res. 2025 Jun 3;27:e75052. doi: 10.2196/75052.

Identification of Major Bleeding Events in Postoperative Patients With Malignant Tumors in Chinese Electronic Medical Records: Algorithm Development and Validation.

JMIR Form Res. 2025 May 1;9:e66189. doi: 10.2196/66189.

Ethical considerations on the role of artificial intelligence in defining the futility in emergency surgery.

Int J Surg. 2025 May 1;111(5):3178-3184. doi: 10.1097/JS9.0000000000002347.

Improving Clinical Documentation with Artificial Intelligence: A Systematic Review.

Perspect Health Inf Manag. 2024 Jun 1;21(2):1d. eCollection 2024 Summer-Fall.

The present and future of digital health, digital medicine, and digital therapeutics for allergic diseases.

Clin Transl Allergy. 2025 Jan;15(1):e70020. doi: 10.1002/clt2.70020.

Challenges and opportunities in enhanced recovery after surgery programs: An overview.

Indian J Anaesth. 2024 Nov;68(11):951-958. doi: 10.4103/ija.ija_546_24. Epub 2024 Oct 26.

LightGBM is an Effective Predictive Model for Postoperative Complications in Gastric Cancer: A Study Integrating Radiomics with Ensemble Learning.

J Imaging Inform Med. 2024 Dec;37(6):3034-3048. doi: 10.1007/s10278-024-01172-0. Epub 2024 Jun 28.

Testing Machine Learning Models to Predict Postoperative Ileus after Colorectal Surgery.

Curr Oncol. 2024 Jun 19;31(6):3563-3578. doi: 10.3390/curroncol31060262.

Patient Embeddings From Diagnosis Codes for Health Care Prediction Tasks: Pat2Vec Machine Learning Framework.

JMIR AI. 2023 Apr 21;2:e40755. doi: 10.2196/40755.

本文引用的文献

Scalable and accurate deep learning with electronic health records.

NPJ Digit Med. 2018 May 8;1:18. doi: 10.1038/s41746-018-0029-1. eCollection 2018.

Resolving the Productivity Paradox of Health Information Technology: A Time for Optimism.

JAMA. 2018 Jul 3;320(1):25-26. doi: 10.1001/jama.2018.5605.

Association of Integrated Care Coordination With Postsurgical Outcomes in High-Risk Older Adults: The Perioperative Optimization of Senior Health (POSH) Initiative.

JAMA Surg. 2018 May 1;153(5):454-462. doi: 10.1001/jamasurg.2017.5513.

Leveraging electronic health records for predictive modeling of post-surgical complications.

Stat Methods Med Res. 2018 Nov;27(11):3271-3285. doi: 10.1177/0962280217696115. Epub 2017 Mar 1.

Barriers to Achieving Economies of Scale in Analysis of EHR Data. A Cautionary Tale.

Appl Clin Inform. 2017 Aug 9;8(3):826-831. doi: 10.4338/ACI-2017-03-CR-0046.

Postoperative Complications of Laparoscopic Cholecystectomy for Acute Cholecystitis: A Comparison to the ACS-NSQIP Risk Calculator and the Tokyo Guidelines.

World J Surg. 2017 Apr;41(4):935-939. doi: 10.1007/s00268-016-3816-3.

Risk Prediction With Electronic Health Records: The Importance of Model Validation and Clinical Context.

JAMA Cardiol. 2016 Dec 1;1(9):976-977. doi: 10.1001/jamacardio.2016.3826.

ACS NSQIP Risk Calculator: An Accurate Predictor of Complications in Major Head and Neck Surgery?

Otolaryngol Head Neck Surg. 2016 Nov;155(5):740-742. doi: 10.1177/0194599816655976. Epub 2016 Jun 21.

Predicting colorectal surgical complications using heterogeneous clinical data and kernel methods.

J Biomed Inform. 2016 Jun;61:87-96. doi: 10.1016/j.jbi.2016.03.008. Epub 2016 Mar 12.

Surgical Risk Preoperative Assessment System (SURPAS): III. Accurate Preoperative Prediction of 8 Adverse Outcomes Using 8 Predictor Variables.

Ann Surg. 2016 Jul;264(1):23-31. doi: 10.1097/SLA.0000000000001678.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用自动整理的电子健康记录数据（Pythia）开发和验证机器学习模型以识别高风险手术患者：一项回顾性、单站点研究。

Development and validation of machine learning models to identify high-risk surgical patients using automatically curated electronic health record data (Pythia): A retrospective, single-site study.

机构信息

Duke Institute for Health Innovation, Durham, North Carolina, United States of America.

Department of Statistical Sciences, Duke University, Durham, North Carolina, United States of America.