生物分子数据库和子网络识别方法：大数据社区的兴趣所在——专家综述

Biomolecular Databases and Subnetwork Identification Approaches of Interest to Big Data Community: An Expert Review.

机构信息

1 Department of Biostatistics, Epidemiology, and Scientific Computing, King Faisal Specialist Hospital and Research Centre, Riyadh, Saudi Arabia.

2 Computer Science Department, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia.

出版信息

OMICS. 2019 Mar;23(3):138-151. doi: 10.1089/omi.2018.0205.

DOI:10.1089/omi.2018.0205

PMID:30883301

Abstract

Next-generation sequencing approaches and genome-wide studies have become essential for characterizing the mechanisms of human diseases. Consequently, many researchers have applied these approaches to discover the genetic/genomic causes of common complex and rare human diseases, generating multiomics big data that span the continuum of genomics, proteomics, metabolomics, and many other system science fields. Therefore, there is a significant and unmet need for biological databases and tools that enable and empower the researchers to analyze, integrate, and make sense of big data. There are currently large number of databases that offer different types of biological information. In particular, the integration of gene expression profiles and protein-protein interaction networks provides a deeper understanding of the complex multilayered molecular architecture of human diseases. Therefore, there has been a growing interest in developing methodologies that integrate and contextualize big data from molecular interaction networks to identify biomarkers of human diseases at a subnetwork resolution as well. In this expert review, we provide a comprehensive summary of most popular biomolecular databases for molecular interactions (e.g., Biological General Repository for Interaction Datasets, Kyoto Encyclopedia of Genes and Genomes and Search Tool for The Retrieval of Interacting Genes/Proteins), gene-disease associations (e.g., Online Mendelian Inheritance in Man, Disease-Gene Network, MalaCards), and population-specific databases (e.g., Human Genetic Variation Database), and describe some examples of their usage and potential applications. We also present the most recent subnetwork identification approaches and discuss their main advantages and limitations. As the field of data science continues to emerge, the present analysis offers a deeper and contextualized understanding of the available databases in molecular biomedicine.

摘要

下一代测序方法和全基因组研究已成为人类疾病机制特征的必要手段。因此，许多研究人员已经应用这些方法来发现常见复杂和罕见人类疾病的遗传/基因组原因，产生了跨越基因组学、蛋白质组学、代谢组学和许多其他系统科学领域的多组学大数据。因此，迫切需要生物数据库和工具来帮助研究人员分析、整合和理解这些大数据。目前有许多提供不同类型生物信息的数据库。特别是，基因表达谱和蛋白质-蛋白质相互作用网络的整合提供了对人类疾病复杂多层次分子结构的更深入理解。因此，人们越来越感兴趣的是开发整合和上下文化来自分子相互作用网络的大数据的方法，以便在子网分辨率下识别人类疾病的生物标志物。在这篇专家综述中，我们全面总结了最流行的用于分子相互作用的生物分子数据库（例如，生物相互作用数据集综合数据库、京都基因与基因组百科全书和检索基因/蛋白质相互作用的工具）、基因-疾病关联（例如，在线孟德尔遗传数据库、疾病-基因网络、MalaCards）和特定人群的数据库（例如，人类遗传变异数据库），并描述了它们的一些使用示例和潜在应用。我们还介绍了最新的子网识别方法，并讨论了它们的主要优点和局限性。随着数据科学领域的不断涌现，本分析提供了对分子生物医学中现有数据库的更深入和上下文化的理解。