Department of Biochemistry and Molecular Biology, University of Arkansas for Medical Sciences, 4301 West Markham Street (slot 516), Little Rock, AR 72205-7199, USA.
Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA.
Mol Omics. 2021 Apr 19;17(2):170-185. doi: 10.1039/d0mo00041h.
With the advancement of next-generation sequencing and mass spectrometry, there is a growing need for the ability to merge biological features in order to study a system as a whole. Features such as the transcriptome, methylome, proteome, histone post-translational modifications and the microbiome all influence the host response to various diseases and cancers. Each of these platforms have technological limitations due to sample preparation steps, amount of material needed for sequencing, and sequencing depth requirements. These features provide a snapshot of one level of regulation in a system. The obvious next step is to integrate this information and learn how genes, proteins, and/or epigenetic factors influence the phenotype of a disease in context of the system. In recent years, there has been a push for the development of data integration methods. Each method specifically integrates a subset of omics data using approaches such as conceptual integration, statistical integration, model-based integration, networks, and pathway data integration. In this review, we discuss considerations of the study design for each data feature, the limitations in gene and protein abundance and their rate of expression, the current data integration methods, and microbiome influences on gene and protein expression. The considerations discussed in this review should be regarded when developing new algorithms for integrating multi-omics data.
随着下一代测序和质谱技术的进步,人们越来越需要将生物学特征合并在一起的能力,以便将整个系统作为一个整体进行研究。转录组、甲基组、蛋白质组、组蛋白翻译后修饰和微生物组等特征都会影响宿主对各种疾病和癌症的反应。由于样品制备步骤、测序所需的材料量和测序深度要求等原因,这些平台都存在技术局限性。这些特征提供了系统中一个调控水平的快照。下一步显然是整合这些信息,并了解基因、蛋白质和/或表观遗传因素如何在系统背景下影响疾病的表型。近年来,人们一直在推动数据集成方法的发展。每种方法都使用概念集成、统计集成、基于模型的集成、网络和途径数据集成等方法专门集成一组组学数据。在这篇综述中,我们讨论了每种数据特征的研究设计注意事项、基因和蛋白质丰度及其表达率的局限性、当前的数据集成方法以及微生物组对基因和蛋白质表达的影响。在开发用于整合多组学数据的新算法时,应考虑本文讨论的注意事项。