Hiissa Jukka, Elo Laura L, Huhtinen Kaisa, Perheentupa Antti, Poutanen Matti, Aittokallio Tero
Biomathematics Research Group, Department of Mathematics, University of Turku, Turku, Finland.
OMICS. 2009 Oct;13(5):381-96. doi: 10.1089/omi.2009.0027.
Genome-scale molecular profiling of clinical sample material often results in heterogeneous datasets beyond the capability of standard statistical procedures. Statistical tests for differential expression, in particular, rely upon the assumption that the sample groups being compared are relatively homogeneous. Such assumption rarely holds in clinical materials, which leads to detection of secondary findings (false positives) or loss of significant targets (false negatives). Here, we introduce a resampling-based procedure, named ReScore, which aggregates individual changes across all the samples while preserving their clinical classes, and thereby provides multiple sets of markers that can effectively characterize distinct sample subsets. When applied to a public leukemia microarray study, the procedure could accurately reveal hidden subgroup structures associated with underlying genotypic abnormalities. The procedure improved both the sensitivity and specificity of the findings, as well as helped us to identify several disease subtype-specific genes that have remained undetected in the conventional analyses. In our endometriosis study, we were able to accurately distinguish between various sources of systematic variation, linked, for example, to tissue-specificity and disease-related factors, many of which would have been missed with standard approaches. The generic procedure should benefit also other global profiling experiments such as those based on mass spectrometry-based proteomic assays.
临床样本材料的全基因组规模分子谱分析常常会产生超出标准统计程序处理能力的异质数据集。特别是差异表达的统计检验,依赖于所比较的样本组相对同质的假设。这种假设在临床材料中很少成立,这会导致检测到次要发现(假阳性)或丢失重要靶点(假阴性)。在这里,我们引入了一种基于重采样的程序,名为ReScore,它汇总了所有样本的个体变化,同时保留其临床类别,从而提供了多组能够有效表征不同样本子集的标志物。当应用于一项公开的白血病微阵列研究时,该程序能够准确揭示与潜在基因型异常相关的隐藏亚组结构。该程序提高了研究结果的敏感性和特异性,还帮助我们鉴定出了一些在传统分析中未被检测到的疾病亚型特异性基因。在我们的子宫内膜异位症研究中,我们能够准确区分各种系统变异来源,例如与组织特异性和疾病相关因素有关的变异,而这些变异在标准方法中很多都会被遗漏。这种通用程序也应该会使其他全局谱分析实验受益,比如基于质谱的蛋白质组学检测实验。