• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

探索深度学习方法,从文本中识别罕见病及其临床表现。

Exploring deep learning methods for recognizing rare diseases and their clinical manifestations from texts.

机构信息

Human Language and Accesibility Technologies, Computer Science Department, Universidad Carlos III de Madrid, Avenidad de la Universidad, 30, Leganés, 28911, Madrid, Spain.

Tissue Engineering and Regenerative Medicine group, Department of Bioengineering, Universidad Carlos III de Madrid, Avenidad de la Universidad, 30, Leganés, 28911, Madrid, Spain.

出版信息

BMC Bioinformatics. 2022 Jul 6;23(1):263. doi: 10.1186/s12859-022-04810-y.

DOI:10.1186/s12859-022-04810-y
PMID:35794528
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9258216/
Abstract

BACKGROUND AND OBJECTIVE

Although rare diseases are characterized by low prevalence, approximately 400 million people are affected by a rare disease. The early and accurate diagnosis of these conditions is a major challenge for general practitioners, who do not have enough knowledge to identify them. In addition to this, rare diseases usually show a wide variety of manifestations, which might make the diagnosis even more difficult. A delayed diagnosis can negatively affect the patient's life. Therefore, there is an urgent need to increase the scientific and medical knowledge about rare diseases. Natural Language Processing (NLP) and Deep Learning can help to extract relevant information about rare diseases to facilitate their diagnosis and treatments.

METHODS

The paper explores several deep learning techniques such as Bidirectional Long Short Term Memory (BiLSTM) networks or deep contextualized word representations based on Bidirectional Encoder Representations from Transformers (BERT) to recognize rare diseases and their clinical manifestations (signs and symptoms).

RESULTS

BioBERT, a domain-specific language representation based on BERT and trained on biomedical corpora, obtains the best results with an F1 of 85.2% for rare diseases. Since many signs are usually described by complex noun phrases that involve the use of use of overlapped, nested and discontinuous entities, the model provides lower results with an F1 of 57.2%.

CONCLUSIONS

While our results are promising, there is still much room for improvement, especially with respect to the identification of clinical manifestations (signs and symptoms).

摘要

背景与目的

尽管罕见病的患病率较低,但全球约有 4 亿人受到罕见病的影响。这些疾病的早期和准确诊断对全科医生来说是一个重大挑战,他们缺乏足够的知识来识别这些疾病。此外,罕见病通常表现出多种不同的症状,这可能使诊断更加困难。诊断延迟可能会对患者的生活产生负面影响。因此,迫切需要增加对罕见病的科学和医学知识。自然语言处理 (NLP) 和深度学习可以帮助提取有关罕见病的相关信息,以促进其诊断和治疗。

方法

本文探讨了几种深度学习技术,如双向长短期记忆 (BiLSTM) 网络或基于转换器的双向编码器表示 (BERT) 的深度上下文词表示,以识别罕见病及其临床表现 (体征和症状)。

结果

基于 BERT 的生物特定领域语言表示 BioBERT 在生物医学语料库上进行训练,在罕见病识别方面取得了最佳效果,F1 值为 85.2%。由于许多体征通常由涉及重叠、嵌套和不连续实体的复杂名词短语描述,因此该模型的 F1 值为 57.2%,识别效果较差。

结论

虽然我们的结果很有希望,但仍有很大的改进空间,特别是在识别临床表现 (体征和症状) 方面。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e1cc/9258216/88f5f3de0e25/12859_2022_4810_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e1cc/9258216/1efb9ab327fe/12859_2022_4810_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e1cc/9258216/88f5f3de0e25/12859_2022_4810_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e1cc/9258216/1efb9ab327fe/12859_2022_4810_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e1cc/9258216/88f5f3de0e25/12859_2022_4810_Fig3_HTML.jpg

相似文献

1
Exploring deep learning methods for recognizing rare diseases and their clinical manifestations from texts.探索深度学习方法,从文本中识别罕见病及其临床表现。
BMC Bioinformatics. 2022 Jul 6;23(1):263. doi: 10.1186/s12859-022-04810-y.
2
Extracting clinical named entity for pituitary adenomas from Chinese electronic medical records.从中文电子病历中提取垂体腺瘤的临床命名实体。
BMC Med Inform Decis Mak. 2022 Mar 23;22(1):72. doi: 10.1186/s12911-022-01810-z.
3
Use of BERT (Bidirectional Encoder Representations from Transformers)-Based Deep Learning Method for Extracting Evidences in Chinese Radiology Reports: Development of a Computer-Aided Liver Cancer Diagnosis Framework.基于 BERT(来自 Transformers 的双向编码器表示)的深度学习方法在提取中文放射学报告证据中的应用:计算机辅助肝癌诊断框架的开发。
J Med Internet Res. 2021 Jan 12;23(1):e19689. doi: 10.2196/19689.
4
BioBERT: a pre-trained biomedical language representation model for biomedical text mining.BioBERT:一种用于生物医学文本挖掘的预训练生物医学语言表示模型。
Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.
5
Extracting comprehensive clinical information for breast cancer using deep learning methods.利用深度学习方法提取乳腺癌全面临床信息。
Int J Med Inform. 2019 Dec;132:103985. doi: 10.1016/j.ijmedinf.2019.103985. Epub 2019 Oct 2.
6
The RareDis corpus: A corpus annotated with rare diseases, their signs and symptoms.罕见病语料库:一个标注了罕见病、其症状和体征的语料库。
J Biomed Inform. 2022 Jan;125:103961. doi: 10.1016/j.jbi.2021.103961. Epub 2021 Dec 5.
7
Fine-Tuning Bidirectional Encoder Representations From Transformers (BERT)-Based Models on Large-Scale Electronic Health Record Notes: An Empirical Study.基于大规模电子健康记录笔记对基于变换器的双向编码器表征(BERT)模型进行微调:一项实证研究。
JMIR Med Inform. 2019 Sep 12;7(3):e14830. doi: 10.2196/14830.
8
Comparing Different Methods for Named Entity Recognition in Portuguese Neurology Text.比较葡萄牙语神经病学文本中命名实体识别的不同方法。
J Med Syst. 2020 Feb 28;44(4):77. doi: 10.1007/s10916-020-1542-8.
9
BioBERT and Similar Approaches for Relation Extraction.BioBERT 及其在关系抽取中的应用。
Methods Mol Biol. 2022;2496:221-235. doi: 10.1007/978-1-0716-2305-3_12.
10
Adversarial active learning for the identification of medical concepts and annotation inconsistency.对抗式主动学习在医学概念识别和标注不一致性中的应用。
J Biomed Inform. 2020 Aug;108:103481. doi: 10.1016/j.jbi.2020.103481. Epub 2020 Jul 18.

引用本文的文献

1
Large Language Models Struggle in Token-Level Clinical Named Entity Recognition.大型语言模型在词元级临床命名实体识别方面存在困难。
AMIA Annu Symp Proc. 2025 May 22;2024:748-757. eCollection 2024.
2
Identifying and Extracting Rare Diseases and Their Phenotypes with Large Language Models.使用大语言模型识别和提取罕见疾病及其表型
J Healthc Inform Res. 2024 Jan 5;8(2):438-461. doi: 10.1007/s41666-023-00155-0. eCollection 2024 Jun.
3
Year 2022 in Medical Natural Language Processing: Availability of Language Models as a Step in the Democratization of NLP in the Biomedical Area.

本文引用的文献

1
Negation and uncertainty detection in clinical texts written in Spanish: a deep learning-based approach.西班牙语临床文本中的否定和不确定性检测:一种基于深度学习的方法。
PeerJ Comput Sci. 2022 Mar 7;8:e913. doi: 10.7717/peerj-cs.913. eCollection 2022.
2
Multisystemic Manifestations in Rare Diseases: The Experience of Dyskeratosis Congenita.罕见病中的多系统表现:先天性角化不良症的经验。
Genes (Basel). 2022 Mar 11;13(3):496. doi: 10.3390/genes13030496.
3
Deep learning with language models improves named entity recognition for PharmaCoNER.
2022 年医学自然语言处理:语言模型的可用性是生物医学领域 NLP 民主化的一步。
Yearb Med Inform. 2023 Aug;32(1):244-252. doi: 10.1055/s-0043-1768752. Epub 2023 Dec 26.
4
Extract antibody and antigen names from biomedical literature.从生物医学文献中提取抗体和抗原名称。
BMC Bioinformatics. 2022 Dec 6;23(1):524. doi: 10.1186/s12859-022-04993-4.
基于语言模型的深度学习可提高 PharmaCoNER 的命名实体识别能力。
BMC Bioinformatics. 2021 Dec 17;22(Suppl 1):602. doi: 10.1186/s12859-021-04260-y.
4
The RareDis corpus: A corpus annotated with rare diseases, their signs and symptoms.罕见病语料库:一个标注了罕见病、其症状和体征的语料库。
J Biomed Inform. 2022 Jan;125:103961. doi: 10.1016/j.jbi.2021.103961. Epub 2021 Dec 5.
5
Context-aware multi-token concept recognition of biological entities.基于上下文的生物实体多令牌概念识别。
BMC Bioinformatics. 2021 Oct 21;22(Suppl 11):337. doi: 10.1186/s12859-021-04248-8.
6
Overview of the ChILD Research Network: A roadmap for progress and success in defining rare diseases.CHILD 研究网络概述:定义罕见病的进展和成功之路。
Pediatr Pulmonol. 2020 Jul;55(7):1819-1827. doi: 10.1002/ppul.24808.
7
How many rare diseases are there?有多少种罕见病?
Nat Rev Drug Discov. 2020 Feb;19(2):77-78. doi: 10.1038/d41573-019-00180-y.
8
Medical students' knowledge and opinions about rare diseases: A case study from Poland.医学生对罕见病的知识与看法:来自波兰的一个案例研究。
Intractable Rare Dis Res. 2019 Nov;8(4):252-259. doi: 10.5582/irdr.2019.01099.
9
Biomedical named entity recognition using deep neural networks with contextual information.基于上下文信息的深度神经网络的生物医学命名实体识别。
BMC Bioinformatics. 2019 Dec 27;20(1):735. doi: 10.1186/s12859-019-3321-4.
10
Fine-Tuning Bidirectional Encoder Representations From Transformers (BERT)-Based Models on Large-Scale Electronic Health Record Notes: An Empirical Study.基于大规模电子健康记录笔记对基于变换器的双向编码器表征(BERT)模型进行微调:一项实证研究。
JMIR Med Inform. 2019 Sep 12;7(3):e14830. doi: 10.2196/14830.