宏基因组知识图谱:用于宏基因组学应用的知识图谱
MetagenomicKG: a knowledge graph for metagenomic applications.
作者信息
Ma Chunyu, Liu Shaopeng, Koslicki David
机构信息
Huck Institutes of the Life Sciences, Pennsylvania State University, State College, Pennsylvania, USA.
Department of Computer Science and Engineering, Pennsylvania State University, State College, Pennsylvania, USA.
出版信息
bioRxiv. 2024 Mar 15:2024.03.14.585056. doi: 10.1101/2024.03.14.585056.
MOTIVATION
The sheer volume and variety of genomic content within microbial communities makes metagenomics a field rich in biomedical knowledge. To traverse these complex communities and their vast unknowns, metagenomic studies often depend on distinct reference databases, such as the Genome Taxonomy Database (GTDB), the Kyoto Encyclopedia of Genes and Genomes (KEGG), and the Bacterial and Viral Bioinformatics Resource Center (BV-BRC), for various analytical purposes. These databases are crucial for genetic and functional annotation of microbial communities. Nevertheless, the inconsistent nomenclature or identifiers of these databases present challenges for effective integration, representation, and utilization. Knowledge graphs (KGs) offer an appropriate solution by organizing biological entities and their interrelations into a cohesive network. The graph structure not only facilitates the unveiling of hidden patterns but also enriches our biological understanding with deeper insights. Despite KGs having shown potential in various biomedical fields, their application in metagenomics remains underexplored.
RESULTS
We present MetagenomicKG, a novel knowledge graph specifically tailored for metagenomic analysis. MetagenomicKG integrates taxonomic, functional, and pathogenesis-related information from widely used databases, and further links these with established biomedical knowledge graphs to expand biological connections. Through several use cases, we demonstrate its utility in enabling hypothesis generation regarding the relationships between microbes and diseases, generating sample-specific graph embeddings, and providing robust pathogen prediction.
AVAILABILITY AND IMPLEMENTATION
The source code and technical details for constructing the MetagenomicKG and reproducing all analyses are available at Github: https://github.com/KoslickiLab/MetagenomicKG. We also host a Neo4j instance: http://mkg.cse.psu.edu:7474 for accessing and querying this graph.
动机
微生物群落中基因组内容的庞大数量和多样性使得宏基因组学成为一个蕴含丰富生物医学知识的领域。为了探究这些复杂的群落及其众多未知因素,宏基因组学研究通常依赖于不同的参考数据库,如基因组分类数据库(GTDB)、京都基因与基因组百科全书(KEGG)以及细菌和病毒生物信息学资源中心(BV - BRC),以用于各种分析目的。这些数据库对于微生物群落的遗传和功能注释至关重要。然而,这些数据库不一致的命名法或标识符给有效整合、表示和利用带来了挑战。知识图谱(KGs)通过将生物实体及其相互关系组织成一个连贯的网络提供了一个合适的解决方案。图谱结构不仅有助于揭示隐藏模式,还能通过更深入的见解丰富我们对生物学的理解。尽管知识图谱在各个生物医学领域已显示出潜力,但其在宏基因组学中的应用仍未得到充分探索。
结果
我们展示了MetagenomicKG,这是一个专门为宏基因组分析量身定制的新型知识图谱。MetagenomicKG整合了来自广泛使用的数据库的分类学、功能和发病机制相关信息,并进一步将这些信息与已建立的生物医学知识图谱相链接,以扩展生物联系。通过几个用例,我们展示了它在生成关于微生物与疾病关系的假设、生成样本特异性图谱嵌入以及提供可靠的病原体预测方面的效用。
可用性和实现
构建MetagenomicKG并重现所有分析的源代码和技术细节可在Github上获取:https://github.com/KoslickiLab/MetagenomicKG。我们还托管了一个Neo4j实例:http://mkg.cse.psu.edu:7474,用于访问和查询此图谱。