Xie Yang, Ahn Chul
Division of Biostatistics, Department of Clinical Sciences, The Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA.
Methods Mol Biol. 2010;620:511-29. doi: 10.1007/978-1-60761-580-4_19.
Large-scale sequencing, copy number, mRNA, and protein data have given great promise to the biomedical research, while posing great challenges to data management and data analysis. Integrating different types of high-throughput data from diverse sources can increase the statistical power of data analysis and provide deeper biological understanding. This chapter uses two biomedical research examples to illustrate why there is an urgent need to develop reliable and robust methods for integrating the heterogeneous data. We then introduce and review some recently developed statistical methods for integrative analysis for both statistical inference and classification purposes. Finally, we present some useful public access databases and program code to facilitate the integrative analysis in practice.
大规模测序、拷贝数、信使核糖核酸和蛋白质数据给生物医学研究带来了巨大希望,同时也给数据管理和数据分析带来了巨大挑战。整合来自不同来源的不同类型高通量数据可以提高数据分析的统计效力,并提供更深入的生物学理解。本章通过两个生物医学研究实例来说明为何迫切需要开发可靠且强大的方法来整合异构数据。然后,我们介绍并回顾一些最近开发的用于统计推断和分类目的的综合分析统计方法。最后,我们提供一些有用的公共访问数据库和程序代码,以促进实际中的综合分析。