Department of Epidemiology and Biostatistics, MRC-HPA Centre for Environment and Health, School of Public Health, Imperial College London, Norfolk Place, London, W2 1PG, United Kingdom.
Environ Mol Mutagen. 2013 Aug;54(7):542-57. doi: 10.1002/em.21797. Epub 2013 Aug 5.
Recent technological advances in molecular biology have given rise to numerous large-scale datasets whose analysis imposes serious methodological challenges mainly relating to the size and complex structure of the data. Considerable experience in analyzing such data has been gained over the past decade, mainly in genetics, from the Genome-Wide Association Study era, and more recently in transcriptomics and metabolomics. Building upon the corresponding literature, we provide here a nontechnical overview of well-established methods used to analyze OMICS data within three main types of regression-based approaches: univariate models including multiple testing correction strategies, dimension reduction techniques, and variable selection models. Our methodological description focuses on methods for which ready-to-use implementations are available. We describe the main underlying assumptions, the main features, and advantages and limitations of each of the models. This descriptive summary constitutes a useful tool for driving methodological choices while analyzing OMICS data, especially in environmental epidemiology, where the emergence of the exposome concept clearly calls for unified methods to analyze marginally and jointly complex exposure and OMICS datasets.
近年来,分子生物学领域的技术进步催生了大量的大规模数据集,其分析带来了严峻的方法学挑战,主要与数据的规模和复杂结构有关。在过去的十年中,主要在遗传学领域,从全基因组关联研究时代开始,最近在转录组学和代谢组学领域,已经积累了相当多的分析此类数据的经验。在相关文献的基础上,我们在这里提供了一种非技术性的概述,介绍了基于回归的三种主要方法类型中用于分析 OMICS 数据的成熟方法:包括多重检验校正策略、降维技术和变量选择模型在内的单变量模型。我们的方法描述侧重于可用于实现的方法。我们描述了每个模型的主要基本假设、主要特征以及优点和局限性。这种描述性总结构成了在分析 OMICS 数据时驱动方法选择的有用工具,特别是在环境流行病学中,外显子组概念的出现显然需要统一的方法来联合分析复杂的暴露和 OMICS 数据集。