Vilardell Mireia, Sánchez-Pla Alex
Statistics Department, University of Barcelona Barcelona, Spain.
Bioinformatics. 2006 Dec 15;22(24):3003-8. doi: 10.1093/bioinformatics/btl544. Epub 2006 Oct 25.
Many gene identification methods assign scores to gene elements prior to their assembly into predicted genes. The scoring system is often based on log-likelihood ratios. These methods usually perform well but it is difficult to interpret how significant a score is.
We have developed several tests of significance for the scores: (1) a sum-of-scores test (SST), (2) an intersection-union test (IUT), based on a multiple hypothesis testing interpretation of an exon's score and (3) a meta-analytical approach (MA), which combines several P-values, corresponding to the exon's parts, to yield a global P-value. We performed simulation studies, which show that the MA has better sensitivity and specificity than other methods and is easier to interpret by non-expert users. This is an improvement over other methods and is especially relevant for users who would like to predict incomplete gene sequences.
许多基因识别方法在将基因元件组装成预测基因之前,会给这些基因元件打分。评分系统通常基于对数似然比。这些方法通常表现良好,但很难解释一个分数的显著程度。
我们开发了几种针对分数的显著性检验方法:(1)分数总和检验(SST),(2)交集并集检验(IUT),该检验基于对外显子分数的多重假设检验解释,以及(3)元分析方法(MA),该方法结合了对应于外显子各部分的几个P值,以产生一个全局P值。我们进行了模拟研究,结果表明MA方法比其他方法具有更好的敏感性和特异性,并且非专业用户更容易解释。这是相对于其他方法的一种改进,对于想要预测不完整基因序列的用户尤其适用。