Park Yonil, Sheetlin Sergey, Ma Ning, Madden Thomas L, Spouge John L
National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD 20894, USA.
BMC Res Notes. 2012 Jun 12;5:286. doi: 10.1186/1756-0500-5-286.
Local alignment programs often calculate the probability that a match occurred by chance. The calculation of this probability may require a "finite-size" correction to the lengths of the sequences, as an alignment that starts near the end of either sequence may run out of sequence before achieving a significant score.
We present an improved finite-size correction that considers the distribution of sequence lengths rather than simply the corresponding means. This approach improves sensitivity and avoids substituting an ad hoc length for short sequences that can underestimate the significance of a match. We use a test set derived from ASTRAL to show improved ROC scores, especially for shorter sequences.
The new finite-size correction improves the calculation of probabilities for a local alignment. It is now used in the BLAST+ package and at the NCBI BLAST web site ( http://blast.ncbi.nlm.nih.gov).
局部比对程序常常计算匹配偶然发生的概率。此概率的计算可能需要对序列长度进行“有限大小”校正,因为在任一序列末端附近开始的比对在获得显著分数之前可能会超出序列范围。
我们提出了一种改进的有限大小校正方法,该方法考虑序列长度的分布而非仅仅是相应的平均值。这种方法提高了灵敏度,并且避免了用一个特设长度替代短序列,因为这可能会低估匹配的显著性。我们使用从ASTRAL派生的测试集来展示改进的ROC分数,特别是对于较短的序列。
新的有限大小校正改进了局部比对概率的计算。它现在用于BLAST+软件包以及NCBI BLAST网站(http://blast.ncbi.nlm.nih.gov)。