Suppr超能文献

关于使用自然语言处理技术,利用电子健康记录中的非结构化数据来实施目标试验框架。

On the use of natural language processing to implement the target trial framework using unstructured data from the electronic health record.

作者信息

Rafalko Nicole, Gianfrancesco Milena, Goldstein Neal D

机构信息

Department of Epidemiology and Biostatistics, Drexel University Dornsife School of Public Health, Philadelphia, PA, USA.

Division of Rheumatology, School of Medicine, University of California, San Francisco, CA, USA.

出版信息

Glob Epidemiol. 2025 May 8;9:100204. doi: 10.1016/j.gloepi.2025.100204. eCollection 2025 Jun.

Abstract

The increasing availability and accessibility of electronic health record (EHR) data has made it a rich secondary source to conduct comparative effectiveness studies. To perform such studies, many researchers are turning to the target trial framework (TTF) to emulate the hypothetical randomized clinical trial. The quality of this emulation depends, in part, on the availability and accessibility of data for each component of the TTF. Yet one overarching challenge with using EHR data is that unstructured fields, such as clinical encounter notes, contain copious details on the patient yet require additional steps to extract if needed in the conduct of the study. Natural language processing (NLP) represents a spectrum of methods to assist with automating this extraction, from simpler rule-based methods to machine learning and artificial intelligence approaches that can handle complex language structures. What follows is a discussion on how NLP methods can augment information and data for researchers looking to estimate a treatment effect using EHR data via the TTF to emulate the hypothetical clinical trial. We conclude with recommendations for researchers interested in using NLP methods to obtain data stored in the free text of the EHR as well as considerations regarding the quality and validity of this data for the TTF.

摘要

电子健康记录(EHR)数据的可得性和可及性不断提高,使其成为进行比较效果研究的丰富二级数据源。为开展此类研究,许多研究人员正转向目标试验框架(TTF)以模拟假设的随机临床试验。这种模拟的质量部分取决于TTF各组成部分数据的可得性和可及性。然而,使用EHR数据的一个首要挑战是,诸如临床会诊记录等非结构化字段包含有关患者的大量细节,但在研究过程中如需提取则需要额外步骤。自然语言处理(NLP)代表了一系列有助于自动化此提取过程的方法,从更简单的基于规则的方法到能够处理复杂语言结构的机器学习和人工智能方法。以下是关于NLP方法如何为希望通过TTF使用EHR数据来估计治疗效果以模拟假设临床试验的研究人员增加信息和数据的讨论。我们最后为有兴趣使用NLP方法获取存储在EHR自由文本中的数据的研究人员提供建议,以及关于此数据对TTF的质量和有效性的考虑因素。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4c18/12140070/1eceaf77d4dd/gr1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验