Zintzaras Elias, Ioannidis John P A
Department of Biomathematics, University of Thessaly School of Medicine, Larissa, Greece.
Comput Biol Chem. 2008 Feb;32(1):38-46. doi: 10.1016/j.compbiolchem.2007.09.003. Epub 2007 Sep 14.
The combination of results from different large-scale datasets of multidimensional biological signals (such as gene expression profiling) presents a major challenge. Methodologies are needed that can efficiently combine diverse datasets, but can also test the extent of diversity (heterogeneity) across the combined studies. We developed METa-analysis of RAnked DISCovery datasets (METRADISC), a generalized meta-analysis method for combining information across discovery-oriented datasets and for testing between-study heterogeneity for each biological variable of interest. The method is based on non-parametric Monte Carlo permutation testing. The tested biological variables are ranked in each study according to the level of statistical significance. METRADISC tests for each biological variable of interest its average rank and the between-study heterogeneity of the study-specific ranks. After accounting for ties and differences in tested variables across studies, we randomly permute the ranks of each study and the simulated metrics of average rank and heterogeneity are calculated. The procedure is repeated to generate null distributions for the metrics. The use of METRADISC is demonstrated empirically using gene expression data from seven studies comparing prostate cancer cases and normal controls. We offer a new tool for combining complex datasets derived from massive testing, discovery-oriented research and for examining the diversity of results across the combined studies.
来自不同的多维生物信号大规模数据集(如基因表达谱分析)的结果组合带来了重大挑战。需要能够有效组合多样数据集,同时还能检验组合研究间的多样性(异质性)程度的方法。我们开发了排序发现数据集的元分析(METRADISC),这是一种广义的元分析方法,用于整合面向发现的数据集信息,并针对每个感兴趣的生物学变量检验研究间的异质性。该方法基于非参数蒙特卡洛置换检验。在每项研究中,根据统计显著性水平对所检验的生物学变量进行排序。METRADISC针对每个感兴趣的生物学变量检验其平均排名以及特定研究排名的研究间异质性。在考虑了研究间所检验变量的平局和差异后,我们对每项研究的排名进行随机置换,并计算平均排名和异质性的模拟指标。重复该过程以生成指标的零分布。通过使用来自七项比较前列腺癌病例与正常对照的研究的基因表达数据,实证展示了METRADISC的应用。我们提供了一种新工具,用于整合源自大规模检测、面向发现的研究的复杂数据集,并检验组合研究结果之间的多样性。