• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

DISNET:一个从公共资源中提取疾病表型信息的框架。

DISNET: a framework for extracting phenotypic disease information from public sources.

作者信息

Lagunes-García Gerardo, Rodríguez-González Alejandro, Prieto-Santamaría Lucía, García Del Valle Eduardo P, Zanin Massimiliano, Menasalvas-Ruiz Ernestina

机构信息

Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, Pozuelo de Alarcón, Madrid, Spain.

Escuela Técnica Superior de Ingenieros Informáticos, Universidad Politécnica de Madrid, Boadilla del Monte, Madrid, Spain.

出版信息

PeerJ. 2020 Feb 17;8:e8580. doi: 10.7717/peerj.8580. eCollection 2020.

DOI:10.7717/peerj.8580
PMID:32110491
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7032061/
Abstract

BACKGROUND

Within the global endeavour of improving population health, one major challenge is the identification and integration of medical knowledge spread through several information sources. The creation of a comprehensive dataset of diseases and their clinical manifestations based on information from public sources is an interesting approach that allows one not only to complement and merge medical knowledge but also to increase it and thereby to interconnect existing data and analyse and relate diseases to each other. In this paper, we present DISNET (http://disnet.ctb.upm.es/), a web-based system designed to periodically extract the knowledge from signs and symptoms retrieved from medical databases, and to enable the creation of customisable disease networks.

METHODS

We here present the main features of the DISNET system. We describe how information on diseases and their phenotypic manifestations is extracted from Wikipedia and PubMed websites; specifically, texts from these sources are processed through a combination of text mining and natural language processing techniques.

RESULTS

We further present the validation of our system on Wikipedia and PubMed texts, obtaining the relevant accuracy. The final output includes the creation of a comprehensive symptoms-disease dataset, shared (free access) through the system's API. We finally describe, with some simple use cases, how a user can interact with it and extract information that could be used for subsequent analyses.

DISCUSSION

DISNET allows retrieving knowledge about the signs, symptoms and diagnostic tests associated with a disease. It is not limited to a specific category (all the categories that the selected sources of information offer us) and clinical diagnosis terms. It further allows to track the evolution of those terms through time, being thus an opportunity to analyse and observe the progress of human knowledge on diseases. We further discussed the validation of the system, suggesting that it is good enough to be used to extract diseases and diagnostically-relevant terms. At the same time, the evaluation also revealed that improvements could be introduced to enhance the system's reliability.

摘要

背景

在全球致力于改善人群健康的努力中,一个主要挑战是识别和整合分散在多个信息源中的医学知识。基于公开信息创建一个关于疾病及其临床表现的综合数据集是一种有趣的方法,它不仅能补充和融合医学知识,还能增加医学知识,从而将现有数据相互连接起来,并分析疾病之间的关系。在本文中,我们介绍了DISNET(http://disnet.ctb.upm.es/),这是一个基于网络的系统,旨在定期从医学数据库中检索到的体征和症状中提取知识,并创建可定制的疾病网络。

方法

我们在此介绍DISNET系统的主要特征。我们描述了如何从维基百科和PubMed网站提取疾病及其表型表现的信息;具体而言,这些来源的文本通过文本挖掘和自然语言处理技术相结合的方式进行处理。

结果

我们进一步展示了该系统在维基百科和PubMed文本上的验证情况,获得了相关的准确性。最终输出包括创建一个综合的症状-疾病数据集,可通过系统的应用程序编程接口共享(免费访问)。我们最后通过一些简单的用例描述了用户如何与它进行交互,并提取可用于后续分析的信息。

讨论

DISNET允许检索与疾病相关的体征、症状和诊断测试的知识。它不限于特定类别(所选信息源提供给我们的所有类别)和临床诊断术语。它还允许跟踪这些术语随时间的演变,从而有机会分析和观察人类对疾病的认识进展。我们进一步讨论了系统的验证情况,表明它足以用于提取疾病和与诊断相关的术语。同时,评估也表明可以进行改进以提高系统的可靠性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a277/7032061/79b29c604a78/peerj-08-8580-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a277/7032061/fe25bb081cd0/peerj-08-8580-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a277/7032061/836bba8a9513/peerj-08-8580-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a277/7032061/2e35824e9ec8/peerj-08-8580-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a277/7032061/6376d3e91763/peerj-08-8580-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a277/7032061/8b6aa080e0c0/peerj-08-8580-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a277/7032061/06fdf538c623/peerj-08-8580-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a277/7032061/3b6111ad97d5/peerj-08-8580-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a277/7032061/355052df31a3/peerj-08-8580-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a277/7032061/fb3371064486/peerj-08-8580-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a277/7032061/cf0c3614cec7/peerj-08-8580-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a277/7032061/79b29c604a78/peerj-08-8580-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a277/7032061/fe25bb081cd0/peerj-08-8580-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a277/7032061/836bba8a9513/peerj-08-8580-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a277/7032061/2e35824e9ec8/peerj-08-8580-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a277/7032061/6376d3e91763/peerj-08-8580-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a277/7032061/8b6aa080e0c0/peerj-08-8580-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a277/7032061/06fdf538c623/peerj-08-8580-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a277/7032061/3b6111ad97d5/peerj-08-8580-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a277/7032061/355052df31a3/peerj-08-8580-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a277/7032061/fb3371064486/peerj-08-8580-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a277/7032061/cf0c3614cec7/peerj-08-8580-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a277/7032061/79b29c604a78/peerj-08-8580-g011.jpg

相似文献

1
DISNET: a framework for extracting phenotypic disease information from public sources.DISNET:一个从公共资源中提取疾病表型信息的框架。
PeerJ. 2020 Feb 17;8:e8580. doi: 10.7717/peerj.8580. eCollection 2020.
2
Knowledge Author: facilitating user-driven, domain content development to support clinical information extraction.知识作者:促进用户驱动的领域内容开发,以支持临床信息提取。
J Biomed Semantics. 2016 Jun 23;7(1):42. doi: 10.1186/s13326-016-0086-9.
3
Beyond the black stump: rapid reviews of health research issues affecting regional, rural and remote Australia.超越黑木树:影响澳大利亚地区、农村和偏远地区的健康研究问题的快速综述。
Med J Aust. 2020 Dec;213 Suppl 11:S3-S32.e1. doi: 10.5694/mja2.50881.
4
Designing an openEHR-Based Pipeline for Extracting and Standardizing Unstructured Clinical Data Using Natural Language Processing.设计一个基于 openEHR 的管道,使用自然语言处理提取和标准化非结构化临床数据。
Methods Inf Med. 2020 Dec;59(S 02):e64-e78. doi: 10.1055/s-0040-1716403. Epub 2020 Oct 14.
5
PREDOSE: a semantic web platform for drug abuse epidemiology using social media.前置:一个利用社交媒体进行药物滥用流行病学研究的语义网平台。
J Biomed Inform. 2013 Dec;46(6):985-97. doi: 10.1016/j.jbi.2013.07.007. Epub 2013 Jul 25.
6
DrNote: An open medical annotation service.DrNote:一项开放的医学注释服务。
PLOS Digit Health. 2022 Aug 15;1(8):e0000086. doi: 10.1371/journal.pdig.0000086. eCollection 2022 Aug.
7
Validation of a Natural Language Processing Algorithm for Detecting Infectious Disease Symptoms in Primary Care Electronic Medical Records in Singapore.用于检测新加坡基层医疗电子病历中传染病症状的自然语言处理算法的验证
JMIR Med Inform. 2018 Jun 11;6(2):e36. doi: 10.2196/medinform.8204.
8
Automated identification of wound information in clinical notes of patients with heart diseases: Developing and validating a natural language processing application.心脏病患者临床记录中伤口信息的自动识别:开发和验证一种自然语言处理应用程序。
Int J Nurs Stud. 2016 Dec;64:25-31. doi: 10.1016/j.ijnurstu.2016.09.013. Epub 2016 Sep 19.
9
Extracting medication information from unstructured public health data: a demonstration on data from population-based and tertiary-based samples.从非结构化公共卫生数据中提取药物信息:基于人群和三级样本数据的演示。
BMC Med Res Methodol. 2020 Oct 15;20(1):258. doi: 10.1186/s12874-020-01131-7.
10
[Psychometric characteristics of questionnaires designed to assess the knowledge, perceptions and practices of health care professionals with regards to alcoholic patients].[旨在评估医护人员对酒精依赖患者的知识、认知及实践情况的调查问卷的心理测量学特征]
Encephale. 2004 Sep-Oct;30(5):437-46. doi: 10.1016/s0013-7006(04)95458-9.

引用本文的文献

1
Finding patterns in lung cancer protein sequences for drug repurposing.寻找肺癌蛋白质序列中的模式以进行药物再利用。
PLoS One. 2025 May 7;20(5):e0322546. doi: 10.1371/journal.pone.0322546. eCollection 2025.
2
Identifying symptom etiologies using syntactic patterns and large language models.使用句法模式和大型语言模型识别症状病因。
Sci Rep. 2024 Jul 13;14(1):16190. doi: 10.1038/s41598-024-65645-6.
3
Protein sequence analysis in the context of drug repurposing.药物再利用背景下的蛋白质序列分析。

本文引用的文献

1
Disease networks and their contribution to disease understanding: A review of their evolution, techniques and data sources.疾病网络及其对疾病认识的贡献:对其演化、技术和数据源的综述。
J Biomed Inform. 2019 Jun;94:103206. doi: 10.1016/j.jbi.2019.103206. Epub 2019 May 8.
2
Mining Disease-Symptom Relation from Massive Biomedical Literature and Its Application in Severe Disease Diagnosis.从海量生物医学文献中挖掘疾病-症状关系及其在重症疾病诊断中的应用
AMIA Annu Symp Proc. 2018 Dec 5;2018:1118-1126. eCollection 2018.
3
STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets.
BMC Med Inform Decis Mak. 2024 May 13;24(1):122. doi: 10.1186/s12911-024-02531-1.
4
Exploring NCATS in-house biomedical data for evidence-based drug repurposing.探索美国国立转化医学推进中心(NCATS)内部生物医学数据以进行循证药物重新利用。
PLoS One. 2024 Jan 25;19(1):e0289518. doi: 10.1371/journal.pone.0289518. eCollection 2024.
5
Identifying patterns to uncover the importance of biological pathways on known drug repurposing scenarios.识别模式以揭示生物途径对已知药物再利用场景的重要性。
BMC Genomics. 2024 Jan 9;25(1):43. doi: 10.1186/s12864-023-09913-1.
6
Repositioning Drugs for Rare Diseases Based on Biological Features and Computational Approaches.基于生物学特征和计算方法为罕见病重新定位药物。
Healthcare (Basel). 2022 Sep 16;10(9):1784. doi: 10.3390/healthcare10091784.
7
Classifying diseases by using biological features to identify potential nosological models.利用生物学特征对疾病进行分类,以识别潜在的疾病模型。
Sci Rep. 2021 Oct 26;11(1):21096. doi: 10.1038/s41598-021-00554-6.
8
Integrating heterogeneous data to facilitate COVID-19 drug repurposing.整合异质数据以促进 COVID-19 药物再利用。
Drug Discov Today. 2022 Feb;27(2):558-566. doi: 10.1016/j.drudis.2021.10.002. Epub 2021 Oct 16.
9
A data-driven methodology towards evaluating the potential of drug repurposing hypotheses.一种用于评估药物重新利用假设潜力的数据驱动方法。
Comput Struct Biotechnol J. 2021 Aug 9;19:4559-4573. doi: 10.1016/j.csbj.2021.08.003. eCollection 2021.
10
Leveraging network analysis to evaluate biomedical named entity recognition tools.利用网络分析评估生物医学命名实体识别工具。
Sci Rep. 2021 Jun 29;11(1):13537. doi: 10.1038/s41598-021-93018-w.
STRING v11:具有增强覆盖范围的蛋白质-蛋白质相互作用网络,支持在全基因组实验数据集的功能发现。
Nucleic Acids Res. 2019 Jan 8;47(D1):D607-D613. doi: 10.1093/nar/gky1131.
4
HumanNet v2: human gene networks for disease research.HumanNet v2:用于疾病研究的人类基因网络。
Nucleic Acids Res. 2019 Jan 8;47(D1):D573-D580. doi: 10.1093/nar/gky1126.
5
Best Match: New relevance search for PubMed.最佳匹配:PubMed 的新相关性搜索。
PLoS Biol. 2018 Aug 28;16(8):e2005343. doi: 10.1371/journal.pbio.2005343. eCollection 2018 Aug.
6
More than 2 billion pairs of eyeballs: Why aren't you sharing medical knowledge on Wikipedia?超过20亿双眼睛:你为什么不在维基百科上分享医学知识呢?
BMJ Evid Based Med. 2019 Jun;24(3):90-91. doi: 10.1136/bmjebm-2018-111040. Epub 2018 Aug 14.
7
DEXTER: Disease-Expression Relation Extraction from Text.DEXTER:从文本中提取疾病-表达关系。
Database (Oxford). 2018 Jan 1;2018. doi: 10.1093/database/bay045.
8
A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstracts.全面且定量地比较了 1500 万篇全文文章及其相应摘要中的文本挖掘。
PLoS Comput Biol. 2018 Feb 15;14(2):e1005962. doi: 10.1371/journal.pcbi.1005962. eCollection 2018 Feb.
9
Text Mining of Rheumatoid Arthritis and Diabetes Mellitus to Understand the Mechanisms of Chinese Medicine in Different Diseases with Same Treatment.类风湿关节炎与糖尿病的文本挖掘,以了解中医异病同治的机制。
Chin J Integr Med. 2018 Oct;24(10):777-784. doi: 10.1007/s11655-018-2825-x. Epub 2018 Jan 9.
10
PedAM: a database for Pediatric Disease Annotation and Medicine.PedAM:一个用于儿科疾病注释和医学的数据库。
Nucleic Acids Res. 2018 Jan 4;46(D1):D977-D983. doi: 10.1093/nar/gkx1049.