Karadeniz İlknur, Hur Junguk, He Yongqun, Özgür Arzucan
Department of Computer Engineering, Boğaziçi University Istanbul, Turkey.
Department of Basic Sciences, School of Medicine and Health Sciences, University of North Dakota, Grand Forks ND, USA.
Front Microbiol. 2015 Dec 9;6:1386. doi: 10.3389/fmicb.2015.01386. eCollection 2015.
Brucella is an intracellular bacterium that causes chronic brucellosis in humans and various mammals. The identification of host-Brucella interaction is crucial to understand host immunity against Brucella infection and Brucella pathogenesis against host immune responses. Most of the information about the inter-species interactions between host and Brucella genes is only available in the text of the scientific publications. Many text-mining systems for extracting gene and protein interactions have been proposed. However, only a few of them have been designed by considering the peculiarities of host-pathogen interactions. In this paper, we used a text mining approach for extracting host-Brucella gene-gene interactions from the abstracts of articles in PubMed. The gene-gene interactions here represent the interactions between genes and/or gene products (e.g., proteins). The SciMiner tool, originally designed for detecting mammalian gene/protein names in text, was extended to identify host and Brucella gene/protein names in the abstracts. Next, sentence-level and abstract-level co-occurrence based approaches, as well as sentence-level machine learning based methods, originally designed for extracting intra-species gene interactions, were utilized to extract the interactions among the identified host and Brucella genes. The extracted interactions were manually evaluated. A total of 46 host-Brucella gene interactions were identified and represented as an interaction network. Twenty four of these interactions were identified from sentence-level processing. Twenty two additional interactions were identified when abstract-level processing was performed. The Interaction Network Ontology (INO) was used to represent the identified interaction types at a hierarchical ontology structure. Ontological modeling of specific gene-gene interactions demonstrates that host-pathogen gene-gene interactions occur at experimental conditions which can be ontologically represented. Our results show that the introduced literature mining and ontology-based modeling approach are effective in retrieving and analyzing host-pathogen gene-gene interaction networks.
布鲁氏菌是一种胞内细菌,可导致人类和多种哺乳动物患上慢性布鲁氏菌病。宿主与布鲁氏菌相互作用的鉴定对于理解宿主针对布鲁氏菌感染的免疫以及布鲁氏菌针对宿主免疫反应的发病机制至关重要。关于宿主与布鲁氏菌基因之间种间相互作用的大多数信息仅存在于科学出版物的文本中。已经提出了许多用于提取基因和蛋白质相互作用的文本挖掘系统。然而,其中只有少数是考虑到宿主 - 病原体相互作用的特殊性而设计的。在本文中,我们使用文本挖掘方法从PubMed文章的摘要中提取宿主 - 布鲁氏菌基因 - 基因相互作用。这里的基因 - 基因相互作用代表基因和/或基因产物(例如蛋白质)之间的相互作用。最初设计用于在文本中检测哺乳动物基因/蛋白质名称的SciMiner工具被扩展用于识别摘要中的宿主和布鲁氏菌基因/蛋白质名称。接下来,最初设计用于提取种内基因相互作用的基于句子级和摘要级共现的方法以及基于句子级机器学习的方法被用于提取已识别的宿主和布鲁氏菌基因之间的相互作用。对提取的相互作用进行人工评估。总共鉴定出46种宿主 - 布鲁氏菌基因相互作用,并将其表示为一个相互作用网络。其中24种相互作用是通过句子级处理鉴定出来的。在进行摘要级处理时又鉴定出另外22种相互作用。相互作用网络本体(INO)用于在分层本体结构中表示已识别的相互作用类型。特定基因 - 基因相互作用的本体建模表明,宿主 - 病原体基因 - 基因相互作用发生在可以进行本体表示的实验条件下。我们的结果表明,所引入的文献挖掘和基于本体的建模方法在检索和分析宿主 - 病原体基因 - 基因相互作用网络方面是有效的。