Fury Wen, Batliwalla Franak, Gregersen Peter K, Li Wentian
Regeneron Pharmaceutical Inc., Tarrytown, NY 10591, USA.
Conf Proc IEEE Eng Med Biol Soc. 2006;2006:5531-4. doi: 10.1109/IEMBS.2006.260828.
When the same set of genes appear in two top ranking gene lists in two different studies, it is often of interest to estimate the probability for this being a chance event. This overlapping probability is well known to follow the hypergeometric distribution. Usually, the lengths of top-ranking gene lists are assumed to be fixed, by using a pre-set criterion on, e.g., p-value for the t-test. We investigate how overlapping probability changes with the gene selection criterion, or simply, with the length of the top-ranking gene lists. It is concluded that overlapping probability is indeed a function of the gene list length, and its statistical significance should be quoted in the context of gene selection criterion.
当同一组基因出现在两项不同研究的两个顶级基因列表中时,人们通常会对估计这是一个偶然事件的概率感兴趣。众所周知,这种重叠概率遵循超几何分布。通常,通过使用例如t检验的p值等预设标准,假设顶级基因列表的长度是固定的。我们研究了重叠概率如何随基因选择标准变化,或者简单地说,随顶级基因列表的长度变化。得出的结论是,重叠概率确实是基因列表长度的函数,其统计显著性应在基因选择标准的背景下引用。