Suppr超能文献

综合组学:工具、进展与未来方法

Integrated Omics: Tools, Advances, and Future Approaches.

作者信息

Misra Biswapriya B, Langefeld Carl D, Olivier Michael, Cox Laura A

机构信息

B Misra, Internal Medicine, Wake Forest University School of Medicine, Winston-Salem, United States.

C Langefeld, Biostatistical Sciences, Wake Forest University School of Medicine, Winston-Salem, United States.

出版信息

J Mol Endocrinol. 2018 Jul 13. doi: 10.1530/JME-18-0055.

Abstract

With the rapid adoption of high-throughput omic approaches to analyze biological samples such as genomics, transcriptomics, proteomics, and metabolomics, each analysis can generate tera- to peta-byte sized data files on a daily basis. These data file sizes, together with differences in nomenclature among these data types, make the integration of these multi-dimensional omics data into biologically meaningful context challenging. Variously named as integrated omics, multi-omics, poly-omics, trans-omics, pan-omics, or shortened to just 'omics', the challenges include differences in data cleaning, normalization, biomolecule identification, data dimensionality reduction, biological contextualization, statistical validation, data storage and handling, sharing, and data archiving. The ultimate goal is towards the holistic realization of a 'systems biology' understanding of the biological question in hand. Commonly used approaches in these efforts are currently limited by the 3 i's - integration, interpretation, and insights. Post integration, these very large datasets aim to yield unprecedented views of cellular systems at exquisite resolution for transformative insights into processes, events, and diseases through various computational and informatics frameworks. With the continued reduction in costs and processing time for sample analyses, and increasing types of omics datasets generated such as glycomics, lipidomics, microbiomics, and phenomics, an increasing number of scientists in this interdisciplinary domain of bioinformatics face these challenges. We discuss recent approaches, existing tools, and potential caveats in the integration of omics datasets for development of standardized analytical pipelines that could be adopted by the global omics research community.

摘要

随着高通量组学方法迅速应用于分析生物样本,如基因组学、转录组学、蛋白质组学和代谢组学,每项分析每天都能生成万亿至千万亿字节大小的数据文件。这些数据文件的大小,以及这些数据类型之间命名方式的差异,使得将这些多维度组学数据整合到具有生物学意义的背景中具有挑战性。这些方法有多种名称,如整合组学、多组学、聚组学、跨组学、泛组学,或简称为“组学”,其挑战包括数据清理、标准化、生物分子识别、数据降维、生物学背景化、统计验证、数据存储与处理、共享以及数据存档等方面的差异。最终目标是全面实现对当前生物学问题的“系统生物学”理解。目前这些工作中常用的方法受到三个“i”的限制,即整合、解释和洞察。整合后,这些非常大的数据集旨在通过各种计算和信息学框架,以前所未有的分辨率呈现细胞系统的视图,从而对过程、事件和疾病获得变革性的见解。随着样本分析成本和处理时间的持续降低,以及生成的组学数据集类型不断增加,如糖组学、脂质组学、微生物组学和表型组学,越来越多从事生物信息学这一跨学科领域的科学家面临这些挑战。我们讨论了组学数据集整合方面的最新方法、现有工具以及潜在的注意事项,以开发可被全球组学研究界采用的标准化分析流程。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验