Webber Caleb, Barton Geoffrey J
EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, England, UK.
Bioinformatics. 2003 Jul 22;19(11):1397-403. doi: 10.1093/bioinformatics/btg156.
Sequence alignment methods that compare two sequences (pairwise methods) are important tools for the detection of biological sequence relationships. In genome annotation, multiple methods are often run and agreement between methods taken as confirmation. In this paper, we assess the advantages of combining search methods by comparing seven pairwise alignment methods, including three local dynamic programming algorithms (PRSS, SSEARCH and SCANPS), two global dynamic programming algorithms (GSRCH and AMPS) and two heuristic approximations (BLAST and FASTA), individually and by pairwise intersection and union of their result lists at equal p-value cut-offs.
When applied singly, the dynamic programming methods SCANPS and SSEARCH gave significantly better coverage (p=0.01) compared to AMPS, GSRCH, PRSS, BLAST and FASTA. Results ranked by BLAST p-values gave significantly better coverage compared to ranking by BLAST e-values. Of 56 combinations of eight methods considered, 19 gave significant increases in coverage at low error compared to the parent methods at an equal p-value cutoff. The union of results by BLAST (p-value) and FASTA at an equal p-value cutoff gave significantly better coverage than either method individually. The best overall performance was obtained from the intersection of the results from SSEARCH and the GSRCH62 global alignment method. At an error level of five false positives, this combination found 444 true positives, a significant 12.4% increase over SSEARCH applied alone.
比较两个序列的序列比对方法(成对方法)是检测生物序列关系的重要工具。在基因组注释中,通常会运行多种方法,并将方法之间的一致性作为确认依据。在本文中,我们通过比较七种成对比对方法来评估组合搜索方法的优势,这七种方法包括三种局部动态规划算法(PRSS、SSEARCH和SCANPS)、两种全局动态规划算法(GSRCH和AMPS)以及两种启发式近似算法(BLAST和FASTA),分别单独应用,并在相同的p值截止值下通过其结果列表的成对交集和并集来进行评估。
单独应用时,与AMPS、GSRCH、PRSS、BLAST和FASTA相比,动态规划方法SCANPS和SSEARCH具有显著更好的覆盖率(p = 0.01)。与按BLAST e值排名相比,按BLAST p值排名的结果具有显著更好的覆盖率。在考虑的八种方法的56种组合中,与在相同p值截止值下的父方法相比,有19种在低错误率下覆盖率显著提高。在相同p值截止值下,BLAST(p值)和FASTA结果的并集比单独使用任何一种方法都具有显著更好的覆盖率。最佳总体性能来自SSEARCH结果与GSRCH62全局比对方法的交集。在五个假阳性的错误水平下,这种组合发现了444个真阳性,比单独应用SSEARCH显著增加了12.4%。