Department of Computer Science, Oklahoma State University, Stillwater, OK, USA.
Department of Biochemistry and Molecular Biology, Oklahoma State University, Stillwater, OK, USA.
Database (Oxford). 2022 Jun 30;2022. doi: 10.1093/database/baac042.
During infection, the pathogen's entry into the host organism, breaching the host immune defense, spread and multiplication are frequently mediated by multiple interactions between the host and pathogen proteins. Systematic studying of host-pathogen interactions (HPIs) is a challenging task for both experimental and computational approaches and is critically dependent on the previously obtained knowledge about these interactions found in the biomedical literature. While several HPI databases exist that manually filter HPI protein-protein interactions from the generic databases and curated experimental interactomic studies, no comprehensive database on HPIs obtained from the biomedical literature is currently available. Here, we introduce a high-throughput literature-mining platform for extracting HPI data that includes the most comprehensive to date collection of HPIs obtained from the PubMed abstracts. Our HPI data portal, PHILM2Web (Pathogen-Host Interactions by Literature Mining on the Web), integrates an automatically generated database of interactions extracted by PHILM, our high-precision HPI literature-mining algorithm. Currently, the database contains 23 581 generic HPIs between 157 host and 403 pathogen organisms from 11 609 abstracts. The interactions were obtained from processing 608 972 PubMed abstracts, each containing mentions of at least one host and one pathogen organisms. In response to the coronavirus disease 2019 (COVID-19) pandemic, we also utilized PHILM to process 25 796 PubMed abstracts obtained by the same query as the COVID-19 Open Research Dataset. This COVID-19 processing batch resulted in 257 HPIs between 19 host and 31 pathogen organisms from 167 abstracts. The access to the entire HPI dataset is available via a searchable PHILM2Web interface; scientists can also download the entire database in bulk for offline processing. Database URL: http://philm2web.live.
在感染过程中,病原体进入宿主生物体、突破宿主免疫防御、传播和繁殖通常是由宿主和病原体蛋白之间的多种相互作用介导的。系统地研究宿主-病原体相互作用(HPIs)对于实验和计算方法都是一项具有挑战性的任务,并且严重依赖于从生物医学文献中获得的关于这些相互作用的先前知识。虽然有几个 HPI 数据库存在,它们从通用数据库和经过精心整理的实验互作研究中手动筛选 HPI 蛋白-蛋白相互作用,但目前还没有一个关于从生物医学文献中获得的 HPIs 的综合数据库。在这里,我们介绍了一种高通量文献挖掘平台,用于提取 HPI 数据,其中包括从 PubMed 摘要中获得的迄今为止最全面的 HPI 集合。我们的 HPI 数据门户 PHILM2Web(通过网络上的文献挖掘进行病原体-宿主相互作用)集成了由 PHILM 自动生成的交互数据库,这是我们高精度 HPI 文献挖掘算法。目前,该数据库包含来自 11609 篇摘要的 157 种宿主和 403 种病原体之间的 23581 个通用 HPI。这些相互作用是通过处理 608972 篇 PubMed 摘要获得的,每篇摘要都至少提到了一种宿主和一种病原体。为了应对 2019 年冠状病毒病(COVID-19)大流行,我们还利用 PHILM 处理了使用与 COVID-19 开放研究数据集相同查询获得的 25796 篇 PubMed 摘要。COVID-19 处理批次从 167 篇摘要中获得了 19 种宿主和 31 种病原体之间的 257 个 HPI。通过可搜索的 PHILM2Web 界面可以访问整个 HPI 数据集;科学家还可以批量下载整个数据库进行离线处理。数据库网址:http://philm2web.live。