Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee 37232, USA.
J Am Med Inform Assoc. 2012 Jun;19(e1):e162-9. doi: 10.1136/amiajnl-2011-000583. Epub 2012 Feb 28.
Electronic health records (EHR) can allow for the generation of large cohorts of individuals with given diseases for clinical and genomic research. A rate-limiting step is the development of electronic phenotype selection algorithms to find such cohorts. This study evaluated the portability of a published phenotype algorithm to identify rheumatoid arthritis (RA) patients from EHR records at three institutions with different EHR systems.
Physicians reviewed charts from three institutions to identify patients with RA. Each institution compiled attributes from various sources in the EHR, including codified data and clinical narratives, which were searched using one of two natural language processing (NLP) systems. The performance of the published model was compared with locally retrained models.
Applying the previously published model from Partners Healthcare to datasets from Northwestern and Vanderbilt Universities, the area under the receiver operating characteristic curve was found to be 92% for Northwestern and 95% for Vanderbilt, compared with 97% at Partners. Retraining the model improved the average sensitivity at a specificity of 97% to 72% from the original 65%. Both the original logistic regression models and locally retrained models were superior to simple billing code count thresholds.
These results show that a previously published algorithm for RA is portable to two external hospitals using different EHR systems, different NLP systems, and different target NLP vocabularies. Retraining the algorithm primarily increased the sensitivity at each site.
Electronic phenotype algorithms allow rapid identification of case populations in multiple sites with little retraining.
电子健康记录(EHR)可以为特定疾病的临床和基因组研究生成大量个体队列。一个限制步骤是开发电子表型选择算法,以找到这样的队列。本研究评估了一种已发表的表型算法,用于从具有不同 EHR 系统的三个机构的 EHR 记录中识别类风湿关节炎(RA)患者。
医生查阅了三个机构的病历,以确定 RA 患者。每个机构都从 EHR 中的各种来源编译属性,包括编码数据和临床叙述,这些数据使用两种自然语言处理(NLP)系统中的一种进行搜索。将已发表模型的性能与本地重新训练的模型进行比较。
将先前发表的来自 Partners Healthcare 的模型应用于 Northwestern 和 Vanderbilt 大学的数据集,发现 Northwestern 的接收者操作特征曲线下面积为 92%,Vanderbilt 为 95%,而 Partners 为 97%。重新训练模型可将原始特异性为 97%时的平均敏感性从 65%提高到 72%。原始逻辑回归模型和本地重新训练的模型均优于简单的计费代码计数阈值。
这些结果表明,先前发表的 RA 算法可移植到使用不同 EHR 系统、不同 NLP 系统和不同目标 NLP 词汇的两个外部医院。重新训练算法主要提高了每个站点的敏感性。
电子表型算法可在无需大量重新培训的情况下快速识别多个站点的病例人群。