Wittkowski Knut M, Song Tingting
Center for Clinical and Translational Science, The Rockefeller University, New York, NY, USA.
Methods Mol Biol. 2010;620:105-53. doi: 10.1007/978-1-60761-580-4_2.
In 2003, the completion of the Human Genome Project (1) together with advances in computational resources (2) were expected to launch an era where the genetic and genomic contributions to many common diseases would be found. In the years following, however, researchers became increasingly frustrated as most reported 'findings' could not be replicated in independent studies (3). To improve the signal/noise ratio, it was suggested to increase the number of cases to be included to tens of thousands (4), a requirement that would dramatically restrict the scope of personalized medicine. Similarly, there was little success in elucidating the gene-gene interactions involved in complex diseases or even in developing criteria for assessing their phenotypes. As a partial solution to these enigmata, we here introduce a class of statistical methods as the 'missing link' between advances in genetics and informatics. As a first step, we provide a unifying view of a plethora of nonparametric tests developed mainly in the 1940s, all of which can be expressed as u-statistics. Then, we will extend this approach to reflect categorical and ordinal relationships between variables, resulting in a flexible and powerful approach to deal with the impact of (1) multiallelic genetic loci, (2) poly-locus genetic regions, and (3) oligo-genetic and oligo-genomic collaborative interactions on complex phenotypes.
2003年,人类基因组计划的完成(1)以及计算资源的进步(2),有望开启一个能够发现遗传和基因组对许多常见疾病影响的时代。然而,在随后的几年里,研究人员越来越沮丧,因为大多数报告的“发现”无法在独立研究中得到重复验证(3)。为了提高信号/噪声比,有人建议将纳入的病例数量增加到数万例(4),这一要求将极大地限制个性化医疗的范围。同样,在阐明复杂疾病中涉及的基因-基因相互作用,甚至在制定评估其表型的标准方面,也几乎没有取得成功。作为这些谜团的部分解决方案,我们在此引入一类统计方法,作为遗传学和信息学进展之间的“缺失环节”。作为第一步,我们对主要在20世纪40年代开发的大量非参数检验提供了一个统一的观点,所有这些检验都可以表示为u统计量。然后,我们将扩展这种方法,以反映变量之间的分类和有序关系,从而形成一种灵活而强大的方法,来处理(1)多等位基因遗传位点、(2)多基因座遗传区域以及(3)寡基因和寡基因组协同相互作用对复杂表型的影响。