Durinck Steffen, Spellman Paul T, Birney Ewan, Huber Wolfgang
Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
Nat Protoc. 2009;4(8):1184-91. doi: 10.1038/nprot.2009.97. Epub 2009 Jul 23.
Genomic experiments produce multiple views of biological systems, among them are DNA sequence and copy number variation, and mRNA and protein abundance. Understanding these systems needs integrated bioinformatic analysis. Public databases such as Ensembl provide relationships and mappings between the relevant sets of probe and target molecules. However, the relationships can be biologically complex and the content of the databases is dynamic. We demonstrate how to use the computational environment R to integrate and jointly analyze experimental datasets, employing BioMart web services to provide the molecule mappings. We also discuss typical problems that are encountered in making gene-to-transcript-to-protein mappings. The approach provides a flexible, programmable and reproducible basis for state-of-the-art bioinformatic data integration.
基因组实验产生了生物系统的多种视图,其中包括DNA序列和拷贝数变异,以及mRNA和蛋白质丰度。理解这些系统需要综合的生物信息学分析。诸如Ensembl这样的公共数据库提供了相关探针和靶分子集之间的关系和映射。然而,这些关系在生物学上可能很复杂,并且数据库的内容是动态的。我们展示了如何使用计算环境R来整合和联合分析实验数据集,利用BioMart网络服务提供分子映射。我们还讨论了在进行基因到转录本再到蛋白质映射时遇到的典型问题。该方法为先进的生物信息学数据整合提供了一个灵活、可编程和可重复的基础。