Department of Medicine, Stanford University, Stanford, CA 94305, United States.
Department of Surgery, Veterans Affairs Palo Alto Health Care System, Palo Alto, CA 94304, United States.
J Am Med Inform Assoc. 2024 Apr 19;31(5):1051-1061. doi: 10.1093/jamia/ocae028.
Predictive models show promise in healthcare, but their successful deployment is challenging due to limited generalizability. Current external validation often focuses on model performance with restricted feature use from the original training data, lacking insights into their suitability at external sites. Our study introduces an innovative methodology for evaluating features during both the development phase and the validation, focusing on creating and validating predictive models for post-surgery patient outcomes with improved generalizability.
Electronic health records (EHRs) from 4 countries (United States, United Kingdom, Finland, and Korea) were mapped to the OMOP Common Data Model (CDM), 2008-2019. Machine learning (ML) models were developed to predict post-surgery prolonged opioid use (POU) risks using data collected 6 months before surgery. Both local and cross-site feature selection methods were applied in the development and external validation datasets. Models were developed using Observational Health Data Sciences and Informatics (OHDSI) tools and validated on separate patient cohorts.
Model development included 41 929 patients, 14.6% with POU. The external validation included 31 932 (UK), 23 100 (US), 7295 (Korea), and 3934 (Finland) patients with POU of 44.2%, 22.0%, 15.8%, and 21.8%, respectively. The top-performing model, Lasso logistic regression, achieved an area under the receiver operating characteristic curve (AUROC) of 0.75 during local validation and 0.69 (SD = 0.02) (averaged) in external validation. Models trained with cross-site feature selection significantly outperformed those using only features from the development site through external validation (P < .05).
Using EHRs across four countries mapped to the OMOP CDM, we developed generalizable predictive models for POU. Our approach demonstrates the significant impact of cross-site feature selection in improving model performance, underscoring the importance of incorporating diverse feature sets from various clinical settings to enhance the generalizability and utility of predictive healthcare models.
预测模型在医疗保健领域具有广阔的应用前景,但由于其通用性有限,成功部署仍具有挑战性。目前的外部验证通常侧重于模型性能,仅使用原始训练数据中的受限特征,缺乏对其在外部站点适用性的深入了解。本研究引入了一种创新的方法,用于在开发阶段和验证阶段评估特征,重点是创建和验证具有更好通用性的术后患者结局预测模型。
将来自 4 个国家(美国、英国、芬兰和韩国)的电子健康记录(EHR)映射到 OMOP 通用数据模型(CDM),时间范围为 2008 年至 2019 年。使用手术前 6 个月收集的数据,采用机器学习(ML)模型预测术后长期阿片类药物使用(POU)风险。在开发和外部验证数据集上应用了本地和跨站点特征选择方法。使用观察性健康数据科学与信息学(OHDSI)工具开发模型,并在单独的患者队列上进行验证。
模型开发包括 41929 名患者,其中 14.6%存在 POU。外部验证包括 31932 名(英国)、23100 名(美国)、7295 名(韩国)和 3934 名(芬兰)POU 患者,POU 发生率分别为 44.2%、22.0%、15.8%和 21.8%。表现最佳的模型为 Lasso 逻辑回归模型,其在本地验证中的 AUC 为 0.75,在外部验证中(平均值)为 0.69(SD=0.02)。通过外部验证,使用跨站点特征选择训练的模型明显优于仅使用开发站点特征的模型(P<0.05)。
使用映射到 OMOP CDM 的来自 4 个国家的 EHR,我们开发了具有通用性的 POU 预测模型。我们的方法证明了跨站点特征选择在提高模型性能方面的重要作用,强调了从各种临床环境中纳入多样化特征集的重要性,以增强预测性医疗保健模型的通用性和实用性。