Suppr超能文献

一种用于非结构化电子健康记录分层标注并集成到标准化医学数据库的框架(SOCRATex):开发与可用性研究

A Framework (SOCRATex) for Hierarchical Annotation of Unstructured Electronic Health Records and Integration Into a Standardized Medical Database: Development and Usability Study.

作者信息

Park Jimyung, You Seng Chan, Jeong Eugene, Weng Chunhua, Park Dongsu, Roh Jin, Lee Dong Yun, Cheong Jae Youn, Choi Jin Wook, Kang Mira, Park Rae Woong

机构信息

Department of Biomedical Sciences, Ajou University Graduate School of Medicine, Suwon, Republic of Korea.

Department of Preventive Medicine and Public Health, Yonsei University College of Medicine, Seoul, Republic of Korea.

出版信息

JMIR Med Inform. 2021 Mar 30;9(3):e23983. doi: 10.2196/23983.

Abstract

BACKGROUND

Although electronic health records (EHRs) have been widely used in secondary assessments, clinical documents are relatively less utilized owing to the lack of standardized clinical text frameworks across different institutions.

OBJECTIVE

This study aimed to develop a framework for processing unstructured clinical documents of EHRs and integration with standardized structured data.

METHODS

We developed a framework known as Staged Optimization of Curation, Regularization, and Annotation of clinical text (SOCRATex). SOCRATex has the following four aspects: (1) extracting clinical notes for the target population and preprocessing the data, (2) defining the annotation schema with a hierarchical structure, (3) performing document-level hierarchical annotation using the annotation schema, and (4) indexing annotations for a search engine system. To test the usability of the proposed framework, proof-of-concept studies were performed on EHRs. We defined three distinctive patient groups and extracted their clinical documents (ie, pathology reports, radiology reports, and admission notes). The documents were annotated and integrated into the Observational Medical Outcomes Partnership (OMOP)-common data model (CDM) database. The annotations were used for creating Cox proportional hazard models with different settings of clinical analyses to measure (1) all-cause mortality, (2) thyroid cancer recurrence, and (3) 30-day hospital readmission.

RESULTS

Overall, 1055 clinical documents of 953 patients were extracted and annotated using the defined annotation schemas. The generated annotations were indexed into an unstructured textual data repository. Using the annotations of pathology reports, we identified that node metastasis and lymphovascular tumor invasion were associated with all-cause mortality among colon and rectum cancer patients (both P=.02). The other analyses involving measuring thyroid cancer recurrence using radiology reports and 30-day hospital readmission using admission notes in depressive disorder patients also showed results consistent with previous findings.

CONCLUSIONS

We propose a framework for hierarchical annotation of textual data and integration into a standardized OMOP-CDM medical database. The proof-of-concept studies demonstrated that our framework can effectively process and integrate diverse clinical documents with standardized structured data for clinical research.

摘要

背景

尽管电子健康记录(EHRs)已广泛应用于二级评估,但由于不同机构缺乏标准化的临床文本框架,临床文档的利用相对较少。

目的

本研究旨在开发一个用于处理电子健康记录中非结构化临床文档并与标准化结构化数据集成的框架。

方法

我们开发了一个名为临床文本策划、规范化和注释的阶段性优化(SOCRATex)的框架。SOCRATex有以下四个方面:(1)为目标人群提取临床笔记并对数据进行预处理,(2)定义具有层次结构的注释模式,(3)使用注释模式进行文档级层次注释,(4)为搜索引擎系统建立注释索引。为了测试所提出框架的可用性,对电子健康记录进行了概念验证研究。我们定义了三个不同的患者群体并提取了他们的临床文档(即病理报告、放射学报告和入院记录)。这些文档经过注释后被整合到观察性医疗结局合作组织(OMOP)通用数据模型(CDM)数据库中。这些注释被用于创建具有不同临床分析设置的Cox比例风险模型,以测量(1)全因死亡率,(2)甲状腺癌复发率,以及(3)30天内再入院率。

结果

总体而言,使用定义的注释模式提取并注释了953例患者的1055份临床文档。生成的注释被编入一个非结构化文本数据存储库。使用病理报告的注释,我们发现结肠癌和直肠癌患者的淋巴结转移和淋巴管肿瘤侵犯与全因死亡率相关(P均 = 0.02)。其他分析包括使用放射学报告测量甲状腺癌复发率以及使用抑郁症患者的入院记录测量30天内再入院率,结果也与先前的研究结果一致。

结论

我们提出了一个用于文本数据层次注释并集成到标准化OMOP - CDM医学数据库中的框架。概念验证研究表明,我们的框架可以有效地处理各种临床文档,并将其与标准化结构化数据集成用于临床研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9fd1/8044740/abcffe9fd8a7/medinform_v9i3e23983_fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验