Suppr超能文献

队列资料简介:圣迈克尔医院结核病数据库(SMH-TB),这是一个使用自然语言处理提取电子健康记录数据和变量的回顾性队列。

Cohort profile: St. Michael's Hospital Tuberculosis Database (SMH-TB), a retrospective cohort of electronic health record data and variables extracted using natural language processing.

机构信息

MAP Centre for Urban Health Solutions, Li Ka Shing Knowledge Institute, St. Michael's Hospital, Unity Health Toronto, Toronto, Ontario, Canada.

Department of Medicine, University of Toronto, Toronto, Ontario, Canada.

出版信息

PLoS One. 2021 Mar 3;16(3):e0247872. doi: 10.1371/journal.pone.0247872. eCollection 2021.

Abstract

BACKGROUND

Tuberculosis (TB) is a major cause of death worldwide. TB research draws heavily on clinical cohorts which can be generated using electronic health records (EHR), but granular information extracted from unstructured EHR data is limited. The St. Michael's Hospital TB database (SMH-TB) was established to address gaps in EHR-derived TB clinical cohorts and provide researchers and clinicians with detailed, granular data related to TB management and treatment.

METHODS

We collected and validated multiple layers of EHR data from the TB outpatient clinic at St. Michael's Hospital, Toronto, Ontario, Canada to generate the SMH-TB database. SMH-TB contains structured data directly from the EHR, and variables generated using natural language processing (NLP) by extracting relevant information from free-text within clinic, radiology, and other notes. NLP performance was assessed using recall, precision and F1 score averaged across variable labels. We present characteristics of the cohort population using binomial proportions and 95% confidence intervals (CI), with and without adjusting for NLP misclassification errors.

RESULTS

SMH-TB currently contains retrospective patient data spanning 2011 to 2018, for a total of 3298 patients (N = 3237 with at least 1 associated dictation). Performance of TB diagnosis and medication NLP rulesets surpasses 93% in recall, precision and F1 metrics, indicating good generalizability. We estimated 20% (95% CI: 18.4-21.2%) were diagnosed with active TB and 46% (95% CI: 43.8-47.2%) were diagnosed with latent TB. After adjusting for potential misclassification, the proportion of patients diagnosed with active and latent TB was 18% (95% CI: 16.8-19.7%) and 40% (95% CI: 37.8-41.6%) respectively.

CONCLUSION

SMH-TB is a unique database that includes a breadth of structured data derived from structured and unstructured EHR data by using NLP rulesets. The data are available for a variety of research applications, such as clinical epidemiology, quality improvement and mathematical modeling studies.

摘要

背景

结核病(TB)是全球主要死因之一。结核病研究主要依赖于临床队列,这些队列可以通过电子健康记录(EHR)生成,但从非结构化 EHR 数据中提取的详细信息有限。圣迈克尔医院结核病数据库(SMH-TB)的建立是为了弥补从 EHR 中提取的结核病临床队列的不足,并为研究人员和临床医生提供与结核病管理和治疗相关的详细、详细的数据。

方法

我们从加拿大安大略省多伦多市圣迈克尔医院的结核病门诊收集并验证了来自多个层面的 EHR 数据,以生成 SMH-TB 数据库。SMH-TB 包含直接从 EHR 中获取的结构化数据,以及使用自然语言处理(NLP)从诊所、放射科和其他记录中的自由文本中提取相关信息生成的变量。使用平均变量标签的召回率、精度和 F1 分数来评估 NLP 性能。我们使用二项式比例和 95%置信区间(CI)来呈现队列人群的特征,同时考虑和不考虑 NLP 分类错误的调整。

结果

SMH-TB 目前包含 2011 年至 2018 年的回顾性患者数据,共有 3298 名患者(N = 3237 名至少有 1 个相关医嘱)。TB 诊断和药物 NLP 规则集的召回率、精度和 F1 指标均超过 93%,表明具有良好的泛化能力。我们估计有 20%(95%CI:18.4-21.2%)被诊断为活动性结核病,46%(95%CI:43.8-47.2%)被诊断为潜伏性结核病。在考虑潜在分类错误的情况下,被诊断为活动性和潜伏性结核病的患者比例分别为 18%(95%CI:16.8-19.7%)和 40%(95%CI:37.8-41.6%)。

结论

SMH-TB 是一个独特的数据库,它包含了广泛的从结构化和非结构化 EHR 数据中提取的结构化数据,通过使用 NLP 规则集。这些数据可用于各种研究应用,如临床流行病学、质量改进和数学模型研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3259/7928444/f17e0f35c58e/pone.0247872.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验