Suppr超能文献

从电子健康记录中提取罕见不良事件的主动学习:儿科心脏病学研究

Active learning for extracting rare adverse events from electronic health records: A study in pediatric cardiology.

作者信息

Quennelle Sophie, Malekzadeh-Milani Sophie, Garcelon Nicolas, Faour Hassan, Burgun Anita, Faviez Carole, Tsopra Rosy, Bonnet Damien, Neuraz Antoine

机构信息

Inserm, UMR_S1138, Centre de Recherche des Cordeliers, Sorbonne Université, Paris, France; Inria, équipe HeKA, PariSantéCampus, Paris, France; M3C-Necker, Hôpital Universitaire Necker-Enfants malades, Assistance Publique-Hôpitaux de Paris, Paris, France; Université Paris Cité, Paris, France.

M3C-Necker, Hôpital Universitaire Necker-Enfants malades, Assistance Publique-Hôpitaux de Paris, Paris, France.

出版信息

Int J Med Inform. 2025 Mar;195:105761. doi: 10.1016/j.ijmedinf.2024.105761. Epub 2024 Dec 12.

Abstract

OBJECTIVE

Automate the extraction of adverse events from the text of electronic medical records of patients hospitalized for cardiac catheterization.

METHODS

We focused on events related to cardiac catheterization as defined by the NCDR-IMPACT registry. These events were extracted from the Necker Children's Hospital data warehouse. Electronic health records were pre-screened using regular expressions. The resulting datasets contained numerous false positives sentences that were annotated by a cardiologist using an active learning process. A deep learning text classifier was then trained on this active learning-annotated dataset to accurately identify patients who have suffered a serious adverse event.

RESULTS

The dataset included 2,980 patients. Regular expression based extraction of adverse events related to cardiac catheterization achieved a perfect recall. Due to the rarity of adverse events, the dataset obtained from this initial pre-screening step was imbalanced, containing a significant number of false positives. The active learning annotation enabled the acquisition of a representative dataset suitable for training a deep learning model. The deep learning text-classifier identified patients who underwent adverse events after cardiac catheterization with a recall of 0.78 and a specificity of 0.94.

CONCLUSION

Our model effectively identified patients who experienced adverse events related to cardiac catheterization using real clinical data. Enabled by an active learning annotation process, it shows promise for large language model applications in clinical research, especially for rare diseases with limited annotated databases. Our model's strength lies in its development by physicians for physicians, ensuring its relevance and applicability in clinical practice.

摘要

目的

实现从因心脏导管插入术住院患者的电子病历文本中自动提取不良事件。

方法

我们重点关注由NCDR-IMPACT注册中心定义的与心脏导管插入术相关的事件。这些事件从内克尔儿童医院数据仓库中提取。使用正则表达式对电子健康记录进行预筛选。生成的数据集包含大量误报句子,由心脏病专家通过主动学习过程进行注释。然后在这个经过主动学习注释的数据集上训练一个深度学习文本分类器,以准确识别遭受严重不良事件的患者。

结果

该数据集包括2980名患者。基于正则表达式提取与心脏导管插入术相关的不良事件实现了完美召回率。由于不良事件罕见,从这个初始预筛选步骤获得的数据集不均衡,包含大量误报。主动学习注释能够获取适合训练深度学习模型的代表性数据集。深度学习文本分类器识别出心脏导管插入术后发生不良事件的患者,召回率为0.78,特异性为0.94。

结论

我们的模型使用真实临床数据有效地识别了经历与心脏导管插入术相关不良事件的患者。通过主动学习注释过程,它在临床研究中的大语言模型应用方面显示出前景,特别是对于注释数据库有限的罕见疾病。我们模型的优势在于由医生为医生开发,确保了其在临床实践中的相关性和适用性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验