• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于医学概念自动提取构建相似患者队列:表型提取研究

Construction of Cohorts of Similar Patients From Automatic Extraction of Medical Concepts: Phenotype Extraction Study.

作者信息

Gérardin Christel, Mageau Arthur, Mékinian Arsène, Tannier Xavier, Carrat Fabrice

机构信息

Institute Pierre Louis Epidemiology and Public Health, Institut National de la Santé et de la Recherche Médicale, Sorbonne Université, Paris, France.

Institut National de la Santé et de la Recherche Médicale, Unité Mixte de Recherche 1137 Infection Antimicrobials Modelling Evolution, Team Decision Sciences in Infectious Diseases, Université Paris Cité, Paris, France.

出版信息

JMIR Med Inform. 2022 Dec 19;10(12):e42379. doi: 10.2196/42379.

DOI:10.2196/42379
PMID:36534446
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9808583/
Abstract

BACKGROUND

Reliable and interpretable automatic extraction of clinical phenotypes from large electronic medical record databases remains a challenge, especially in a language other than English.

OBJECTIVE

We aimed to provide an automated end-to-end extraction of cohorts of similar patients from electronic health records for systemic diseases.

METHODS

Our multistep algorithm includes a named-entity recognition step, a multilabel classification using medical subject headings ontology, and the computation of patient similarity. A selection of cohorts of similar patients on a priori annotated phenotypes was performed. Six phenotypes were selected for their clinical significance: P1, osteoporosis; P2, nephritis in systemic erythematosus lupus; P3, interstitial lung disease in systemic sclerosis; P4, lung infection; P5, obstetric antiphospholipid syndrome; and P6, Takayasu arteritis. We used a training set of 151 clinical notes and an independent validation set of 256 clinical notes, with annotated phenotypes, both extracted from the Assistance Publique-Hôpitaux de Paris data warehouse. We evaluated the precision of the 3 patients closest to the index patient for each phenotype with precision-at-3 and recall and average precision.

RESULTS

For P1-P4, the precision-at-3 ranged from 0.85 (95% CI 0.75-0.95) to 0.99 (95% CI 0.98-1), the recall ranged from 0.53 (95% CI 0.50-0.55) to 0.83 (95% CI 0.81-0.84), and the average precision ranged from 0.58 (95% CI 0.54-0.62) to 0.88 (95% CI 0.85-0.90). P5-P6 phenotypes could not be analyzed due to the limited number of phenotypes.

CONCLUSIONS

Using a method close to clinical reasoning, we built a scalable and interpretable end-to-end algorithm for extracting cohorts of similar patients.

摘要

背景

从大型电子病历数据库中可靠且可解释地自动提取临床表型仍然是一项挑战,尤其是在非英语语言环境中。

目的

我们旨在从电子健康记录中为系统性疾病自动进行端到端的相似患者队列提取。

方法

我们的多步骤算法包括一个命名实体识别步骤、使用医学主题词本体的多标签分类以及患者相似度计算。在预先注释的表型上进行相似患者队列的选择。选择了六种具有临床意义的表型:P1,骨质疏松症;P2,系统性红斑狼疮性肾炎;P3,系统性硬化症中的间质性肺疾病;P4,肺部感染;P5,产科抗磷脂综合征;以及P6,高安动脉炎。我们使用了一组包含151份临床记录的训练集和一组包含256份临床记录的独立验证集,两者均带有注释表型,这些数据均从巴黎公共救助医院数据仓库中提取。我们使用精确率@3、召回率和平均精确率评估了每种表型中最接近索引患者的3名患者的精确率。

结果

对于P1 - P4,精确率@3范围为0.85(95%置信区间0.75 - 0.95)至0.99(95%置信区间0.98 - 1),召回率范围为0.53(95%置信区间0.50 - 0.55)至0.83(95%置信区间0.81 - 0.84),平均精确率范围为0.58(95%置信区间0.54 - 0.62)至0.88(95%置信区间0.85 - 0.90)。由于表型数量有限,P5 - P6表型无法进行分析。

结论

通过使用一种接近临床推理的方法,我们构建了一种可扩展且可解释的端到端算法,用于提取相似患者队列。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f97/9808583/723131c5bb4d/medinform_v10i12e42379_fig9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f97/9808583/dee183778e45/medinform_v10i12e42379_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f97/9808583/fa7f3f5af050/medinform_v10i12e42379_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f97/9808583/fe7e472b0553/medinform_v10i12e42379_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f97/9808583/f82d8e62beee/medinform_v10i12e42379_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f97/9808583/5f2df017648b/medinform_v10i12e42379_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f97/9808583/fa7b0ce5b153/medinform_v10i12e42379_fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f97/9808583/9df5462f0524/medinform_v10i12e42379_fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f97/9808583/874c084490b4/medinform_v10i12e42379_fig8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f97/9808583/723131c5bb4d/medinform_v10i12e42379_fig9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f97/9808583/dee183778e45/medinform_v10i12e42379_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f97/9808583/fa7f3f5af050/medinform_v10i12e42379_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f97/9808583/fe7e472b0553/medinform_v10i12e42379_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f97/9808583/f82d8e62beee/medinform_v10i12e42379_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f97/9808583/5f2df017648b/medinform_v10i12e42379_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f97/9808583/fa7b0ce5b153/medinform_v10i12e42379_fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f97/9808583/9df5462f0524/medinform_v10i12e42379_fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f97/9808583/874c084490b4/medinform_v10i12e42379_fig8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f97/9808583/723131c5bb4d/medinform_v10i12e42379_fig9.jpg

相似文献

1
Construction of Cohorts of Similar Patients From Automatic Extraction of Medical Concepts: Phenotype Extraction Study.基于医学概念自动提取构建相似患者队列:表型提取研究
JMIR Med Inform. 2022 Dec 19;10(12):e42379. doi: 10.2196/42379.
2
Ensembles of natural language processing systems for portable phenotyping solutions.用于便携表型解决方案的自然语言处理系统集合。
J Biomed Inform. 2019 Dec;100:103318. doi: 10.1016/j.jbi.2019.103318. Epub 2019 Oct 23.
3
Multilabel classification of medical concepts for patient clinical profile identification.用于患者临床特征识别的医学概念的多标签分类。
Artif Intell Med. 2022 Jun;128:102311. doi: 10.1016/j.artmed.2022.102311. Epub 2022 Apr 26.
4
[A customized method for information extraction from unstructured text data in the electronic medical records].[一种从电子病历非结构化文本数据中提取信息的定制方法]
Beijing Da Xue Xue Bao Yi Xue Ban. 2018 Apr 18;50(2):256-263.
5
A Hybrid Model for Family History Information Identification and Relation Extraction: Development and Evaluation of an End-to-End Information Extraction System.一种用于家族病史信息识别与关系抽取的混合模型:一个端到端信息抽取系统的开发与评估
JMIR Med Inform. 2021 Apr 22;9(4):e22797. doi: 10.2196/22797.
6
Task definition, annotated dataset, and supervised natural language processing models for symptom extraction from unstructured clinical notes.从非结构化临床记录中提取症状的任务定义、标注数据集和监督自然语言处理模型。
J Biomed Inform. 2020 Feb;102:103354. doi: 10.1016/j.jbi.2019.103354. Epub 2019 Dec 12.
7
Comparison of ACM and CLAMP for Entity Extraction in Clinical Notes.临床笔记实体抽取中 ACM 和 CLAMP 的比较。
Annu Int Conf IEEE Eng Med Biol Soc. 2021 Nov;2021:1989-1992. doi: 10.1109/EMBC46164.2021.9630611.
8
Extraction of Information Related to Drug Safety Surveillance From Electronic Health Record Notes: Joint Modeling of Entities and Relations Using Knowledge-Aware Neural Attentive Models.从电子健康记录笔记中提取与药物安全监测相关的信息:使用知识感知神经注意力模型对实体和关系进行联合建模
JMIR Med Inform. 2020 Jul 10;8(7):e18417. doi: 10.2196/18417.
9
A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries.基于机器学习的方法从出院小结中提取临床实体及其断言的研究。
J Am Med Inform Assoc. 2011 Sep-Oct;18(5):601-6. doi: 10.1136/amiajnl-2011-000163. Epub 2011 Apr 20.
10
Use of "off-the-shelf" information extraction algorithms in clinical informatics: A feasibility study of MetaMap annotation of Italian medical notes.临床信息学中“现成可用”信息提取算法的应用:意大利医学记录的MetaMap注释可行性研究。
J Biomed Inform. 2016 Oct;63:22-32. doi: 10.1016/j.jbi.2016.07.017. Epub 2016 Jul 18.

引用本文的文献

1
Improving Phenotyping of Patients With Immune-Mediated Inflammatory Diseases Through Automated Processing of Discharge Summaries: Multicenter Cohort Study.通过出院小结自动处理改善免疫介导性炎症疾病患者的表型分析:多中心队列研究
JMIR Med Inform. 2025 Apr 9;13:e68704. doi: 10.2196/68704.
2
Year 2022 in Medical Natural Language Processing: Availability of Language Models as a Step in the Democratization of NLP in the Biomedical Area.2022 年医学自然语言处理:语言模型的可用性是生物医学领域 NLP 民主化的一步。
Yearb Med Inform. 2023 Aug;32(1):244-252. doi: 10.1055/s-0043-1768752. Epub 2023 Dec 26.

本文引用的文献

1
Multilabel classification of medical concepts for patient clinical profile identification.用于患者临床特征识别的医学概念的多标签分类。
Artif Intell Med. 2022 Jun;128:102311. doi: 10.1016/j.artmed.2022.102311. Epub 2022 Apr 26.
2
Phe2vec: Automated disease phenotyping based on unsupervised embeddings from electronic health records.Phe2vec:基于电子健康记录的无监督嵌入进行自动疾病表型分析。
Patterns (N Y). 2021 Sep 2;2(9):100337. doi: 10.1016/j.patter.2021.100337. eCollection 2021 Sep 10.
3
Automatic phenotyping of electronical health record: PheVis algorithm.
电子健康记录的自动表型分析:PheVis算法。
J Biomed Inform. 2021 May;117:103746. doi: 10.1016/j.jbi.2021.103746. Epub 2021 Mar 19.
4
ACE: the Advanced Cohort Engine for searching longitudinal patient records.ACE:用于搜索纵向患者记录的高级队列引擎。
J Am Med Inform Assoc. 2021 Jul 14;28(7):1468-1479. doi: 10.1093/jamia/ocab027.
5
Personalized treatment options for chronic diseases using precision cohort analytics.利用精准队列分析为慢性病提供个性化治疗方案。
Sci Rep. 2021 Jan 13;11(1):1139. doi: 10.1038/s41598-021-80967-5.
6
Author Correction: SciPy 1.0: fundamental algorithms for scientific computing in Python.作者更正:SciPy 1.0:Python中科学计算的基础算法。
Nat Methods. 2020 Mar;17(3):352. doi: 10.1038/s41592-020-0772-5.
7
A patient-similarity-based model for diagnostic prediction.基于患者相似性的诊断预测模型。
Int J Med Inform. 2020 Mar;135:104073. doi: 10.1016/j.ijmedinf.2019.104073. Epub 2019 Dec 30.
8
BioBERT: a pre-trained biomedical language representation model for biomedical text mining.BioBERT:一种用于生物医学文本挖掘的预训练生物医学语言表示模型。
Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.
9
Use of Natural Language Processing to Extract Clinical Cancer Phenotypes from Electronic Medical Records.利用自然语言处理从电子病历中提取临床癌症表型
Cancer Res. 2019 Nov 1;79(21):5463-5470. doi: 10.1158/0008-5472.CAN-19-0579. Epub 2019 Aug 8.
10
Developing a Prognostic Information System for Personalized Care in Real Time.开发用于实时个性化护理的预后信息系统。
EGEMS (Wash DC). 2019 Mar 25;7(1):2. doi: 10.5334/egems.266.