Lagunes-García Gerardo, Rodríguez-González Alejandro, Prieto-Santamaría Lucía, García Del Valle Eduardo P, Zanin Massimiliano, Menasalvas-Ruiz Ernestina
Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, Pozuelo de Alarcón, Madrid, Spain.
Escuela Técnica Superior de Ingenieros Informáticos, Universidad Politécnica de Madrid, Boadilla del Monte, Madrid, Spain.
PeerJ. 2020 Feb 17;8:e8580. doi: 10.7717/peerj.8580. eCollection 2020.
Within the global endeavour of improving population health, one major challenge is the identification and integration of medical knowledge spread through several information sources. The creation of a comprehensive dataset of diseases and their clinical manifestations based on information from public sources is an interesting approach that allows one not only to complement and merge medical knowledge but also to increase it and thereby to interconnect existing data and analyse and relate diseases to each other. In this paper, we present DISNET (http://disnet.ctb.upm.es/), a web-based system designed to periodically extract the knowledge from signs and symptoms retrieved from medical databases, and to enable the creation of customisable disease networks.
We here present the main features of the DISNET system. We describe how information on diseases and their phenotypic manifestations is extracted from Wikipedia and PubMed websites; specifically, texts from these sources are processed through a combination of text mining and natural language processing techniques.
We further present the validation of our system on Wikipedia and PubMed texts, obtaining the relevant accuracy. The final output includes the creation of a comprehensive symptoms-disease dataset, shared (free access) through the system's API. We finally describe, with some simple use cases, how a user can interact with it and extract information that could be used for subsequent analyses.
DISNET allows retrieving knowledge about the signs, symptoms and diagnostic tests associated with a disease. It is not limited to a specific category (all the categories that the selected sources of information offer us) and clinical diagnosis terms. It further allows to track the evolution of those terms through time, being thus an opportunity to analyse and observe the progress of human knowledge on diseases. We further discussed the validation of the system, suggesting that it is good enough to be used to extract diseases and diagnostically-relevant terms. At the same time, the evaluation also revealed that improvements could be introduced to enhance the system's reliability.
在全球致力于改善人群健康的努力中,一个主要挑战是识别和整合分散在多个信息源中的医学知识。基于公开信息创建一个关于疾病及其临床表现的综合数据集是一种有趣的方法,它不仅能补充和融合医学知识,还能增加医学知识,从而将现有数据相互连接起来,并分析疾病之间的关系。在本文中,我们介绍了DISNET(http://disnet.ctb.upm.es/),这是一个基于网络的系统,旨在定期从医学数据库中检索到的体征和症状中提取知识,并创建可定制的疾病网络。
我们在此介绍DISNET系统的主要特征。我们描述了如何从维基百科和PubMed网站提取疾病及其表型表现的信息;具体而言,这些来源的文本通过文本挖掘和自然语言处理技术相结合的方式进行处理。
我们进一步展示了该系统在维基百科和PubMed文本上的验证情况,获得了相关的准确性。最终输出包括创建一个综合的症状-疾病数据集,可通过系统的应用程序编程接口共享(免费访问)。我们最后通过一些简单的用例描述了用户如何与它进行交互,并提取可用于后续分析的信息。
DISNET允许检索与疾病相关的体征、症状和诊断测试的知识。它不限于特定类别(所选信息源提供给我们的所有类别)和临床诊断术语。它还允许跟踪这些术语随时间的演变,从而有机会分析和观察人类对疾病的认识进展。我们进一步讨论了系统的验证情况,表明它足以用于提取疾病和与诊断相关的术语。同时,评估也表明可以进行改进以提高系统的可靠性。