Suppr超能文献

使用自动整理的电子健康记录数据(Pythia)开发和验证机器学习模型以识别高风险手术患者:一项回顾性、单站点研究。

Development and validation of machine learning models to identify high-risk surgical patients using automatically curated electronic health record data (Pythia): A retrospective, single-site study.

机构信息

Duke Institute for Health Innovation, Durham, North Carolina, United States of America.

Department of Statistical Sciences, Duke University, Durham, North Carolina, United States of America.

出版信息

PLoS Med. 2018 Nov 27;15(11):e1002701. doi: 10.1371/journal.pmed.1002701. eCollection 2018 Nov.

Abstract

BACKGROUND

Pythia is an automated, clinically curated surgical data pipeline and repository housing all surgical patient electronic health record (EHR) data from a large, quaternary, multisite health institute for data science initiatives. In an effort to better identify high-risk surgical patients from complex data, a machine learning project trained on Pythia was built to predict postoperative complication risk.

METHODS AND FINDINGS

A curated data repository of surgical outcomes was created using automated SQL and R code that extracted and processed patient clinical and surgical data across 37 million clinical encounters from the EHRs. A total of 194 clinical features including patient demographics (e.g., age, sex, race), smoking status, medications, comorbidities, procedure information, and proxies for surgical complexity were constructed and aggregated. A cohort of 66,370 patients that had undergone 99,755 invasive procedural encounters between January 1, 2014, and January 31, 2017, was studied further for the purpose of predicting postoperative complications. The average complication and 30-day postoperative mortality rates of this cohort were 16.0% and 0.51%, respectively. Least absolute shrinkage and selection operator (lasso) penalized logistic regression, random forest models, and extreme gradient boosted decision trees were trained on this surgical cohort with cross-validation on 14 specific postoperative outcome groupings. Resulting models had area under the receiver operator characteristic curve (AUC) values ranging between 0.747 and 0.924, calculated on an out-of-sample test set from the last 5 months of data. Lasso penalized regression was identified as a high-performing model, providing clinically interpretable actionable insights. Highest and lowest performing lasso models predicted postoperative shock and genitourinary outcomes with AUCs of 0.924 (95% CI: 0.901, 0.946) and 0.780 (95% CI: 0.752, 0.810), respectively. A calculator requiring input of 9 data fields was created to produce a risk assessment for the 14 groupings of postoperative outcomes. A high-risk threshold (15% risk of any complication) was determined to identify high-risk surgical patients. The model sensitivity was 76%, with a specificity of 76%. Compared to heuristics that identify high-risk patients developed by clinical experts and the ACS NSQIP calculator, this tool performed superiorly, providing an improved approach for clinicians to estimate postoperative risk for patients. Limitations of this study include the missingness of data that were removed for analysis.

CONCLUSIONS

Extracting and curating a large, local institution's EHR data for machine learning purposes resulted in models with strong predictive performance. These models can be used in clinical settings as decision support tools for identification of high-risk patients as well as patient evaluation and care management. Further work is necessary to evaluate the impact of the Pythia risk calculator within the clinical workflow on postoperative outcomes and to optimize this data flow for future machine learning efforts.

摘要

背景

Pythia 是一个自动化的、经过临床审核的手术数据管道和存储库,其中包含来自大型四等级多地点健康研究所的所有外科患者电子健康记录 (EHR) 数据,用于数据科学计划。为了更好地从复杂数据中识别高风险手术患者,我们构建了一个基于 Pythia 的机器学习项目来预测术后并发症风险。

方法和发现

我们使用自动化的 SQL 和 R 代码创建了一个经过审核的手术结果数据存储库,该代码从 EHR 中提取和处理了 3700 万次临床就诊中的患者临床和手术数据。共构建并汇总了 194 个临床特征,包括患者人口统计学信息(如年龄、性别、种族)、吸烟状况、药物使用情况、合并症、手术信息以及手术复杂性的替代指标。我们进一步研究了一个包含 66370 名患者的队列,这些患者在 2014 年 1 月 1 日至 2017 年 1 月 31 日期间进行了 99755 次有创手术。该队列用于预测术后并发症。该队列的平均并发症发生率和 30 天术后死亡率分别为 16.0%和 0.51%。最小绝对收缩和选择算子(lasso)惩罚逻辑回归、随机森林模型和极端梯度增强决策树在这个手术队列上进行了训练,并在 14 个特定的术后结局分组上进行了交叉验证。结果模型的受试者工作特征曲线下面积(AUC)值在 0.747 到 0.924 之间,在最后 5 个月的数据的外部测试集中计算得出。lasso 惩罚回归被确定为一种表现良好的模型,提供了可临床解释的可操作见解。性能最高和最低的 lasso 模型分别预测术后休克和泌尿生殖结局的 AUC 值为 0.924(95%CI:0.901,0.946)和 0.780(95%CI:0.752,0.810)。创建了一个需要输入 9 个数据字段的计算器,以对 14 组术后结局进行风险评估。确定了一个高风险阈值(任何并发症的 15%风险)来识别高风险手术患者。该模型的灵敏度为 76%,特异性为 76%。与由临床专家和 ACS NSQIP 计算器开发的识别高风险患者的启发式方法相比,该工具表现更好,为临床医生提供了一种更好的方法来估计患者的术后风险。本研究的局限性包括为分析而删除的数据缺失。

结论

为机器学习目的提取和审核大型本地机构的 EHR 数据可产生具有强大预测性能的模型。这些模型可在临床环境中用作决策支持工具,以识别高风险患者以及患者评估和护理管理。需要进一步工作来评估 Pythia 风险计算器在术后结局和优化未来机器学习工作的临床工作流程中的影响。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c561/6258507/41b6cb05c1f4/pmed.1002701.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验