Suppr超能文献

遗传学与基因组学中的数据整合:方法与挑战

Data integration in genetics and genomics: methods and challenges.

作者信息

Hamid Jemila S, Hu Pingzhao, Roslin Nicole M, Ling Vicki, Greenwood Celia M T, Beyene Joseph

机构信息

Biostatistics Methodology Unit, The Hospital for Sick Children Research Institute, 555 University Avenue, Toronto, ON, Canada M5G 1X8.

出版信息

Hum Genomics Proteomics. 2009 Jan 12;2009:869093. doi: 10.4061/2009/869093.

Abstract

Due to rapid technological advances, various types of genomic and proteomic data with different sizes, formats, and structures have become available. Among them are gene expression, single nucleotide polymorphism, copy number variation, and protein-protein/gene-gene interactions. Each of these distinct data types provides a different, partly independent and complementary, view of the whole genome. However, understanding functions of genes, proteins, and other aspects of the genome requires more information than provided by each of the datasets. Integrating data from different sources is, therefore, an important part of current research in genomics and proteomics. Data integration also plays important roles in combining clinical, environmental, and demographic data with high-throughput genomic data. Nevertheless, the concept of data integration is not well defined in the literature and it may mean different things to different researchers. In this paper, we first propose a conceptual framework for integrating genetic, genomic, and proteomic data. The framework captures fundamental aspects of data integration and is developed taking the key steps in genetic, genomic, and proteomic data fusion. Secondly, we provide a review of some of the most commonly used current methods and approaches for combining genomic data with focus on the statistical aspects.

摘要

由于技术的飞速发展,各种不同大小、格式和结构的基因组和蛋白质组数据已变得可用。其中包括基因表达、单核苷酸多态性、拷贝数变异以及蛋白质-蛋白质/基因-基因相互作用。这些不同的数据类型中的每一种都提供了关于整个基因组的不同的、部分独立且互补的视角。然而,理解基因、蛋白质以及基因组的其他方面的功能需要比每个数据集所提供的更多信息。因此,整合来自不同来源的数据是当前基因组学和蛋白质组学研究的重要组成部分。数据整合在将临床、环境和人口统计学数据与高通量基因组数据相结合方面也发挥着重要作用。尽管如此,数据整合的概念在文献中并未得到很好的定义,对不同的研究人员可能意味着不同的事情。在本文中,我们首先提出一个用于整合遗传、基因组和蛋白质组数据的概念框架。该框架涵盖了数据整合的基本方面,并结合了遗传、基因组和蛋白质组数据融合的关键步骤来构建。其次,我们对当前一些最常用的将基因组数据相结合的方法和途径进行综述,重点关注统计方面。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ced/2950414/46a5e2139d62/HGP2009-869093.001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验