Kafkas Şenay, Hoehndorf Robert
Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, 23955-6900, Saudi Arabia.
Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal, 23955-6900, Saudi Arabia.
J Biomed Semantics. 2019 Sep 18;10(1):15. doi: 10.1186/s13326-019-0208-2.
Infectious diseases claim millions of lives especially in the developing countries each year. Identification of causative pathogens accurately and rapidly plays a key role in the success of treatment. To support infectious disease research and mechanisms of infection, there is a need for an open resource on pathogen-disease associations that can be utilized in computational studies. A large number of pathogen-disease associations is available from the literature in unstructured form and we need automated methods to extract the data.
We developed a text mining system designed for extracting pathogen-disease relations from literature. Our approach utilizes background knowledge from an ontology and statistical methods for extracting associations between pathogens and diseases. In total, we extracted a total of 3420 pathogen-disease associations from literature. We integrated our literature-derived associations into a database which links pathogens to their phenotypes for supporting infectious disease research.
To the best of our knowledge, we present the first study focusing on extracting pathogen-disease associations from publications. We believe the text mined data can be utilized as a valuable resource for infectious disease research. All the data is publicly available from https://github.com/bio-ontology-research-group/padimi and through a public SPARQL endpoint from http://patho.phenomebrowser.net/ .
传染病每年造成数百万人死亡,尤其是在发展中国家。准确快速地识别致病病原体对治疗的成功起着关键作用。为了支持传染病研究和感染机制研究,需要一个关于病原体 - 疾病关联的开放资源,可用于计算研究。文献中有大量以非结构化形式存在的病原体 - 疾病关联,我们需要自动化方法来提取这些数据。
我们开发了一个用于从文献中提取病原体 - 疾病关系的文本挖掘系统。我们的方法利用来自本体的背景知识和统计方法来提取病原体与疾病之间的关联。我们总共从文献中提取了3420个病原体 - 疾病关联。我们将从文献中获得的关联整合到一个数据库中,该数据库将病原体与其表型联系起来,以支持传染病研究。
据我们所知,我们提出了第一项专注于从出版物中提取病原体 - 疾病关联的研究。我们相信文本挖掘的数据可作为传染病研究的宝贵资源。所有数据可从https://github.com/bio-ontology-research-group/padimi公开获取,并通过http://patho.phenomebrowser.net/的公共SPARQL端点获取。