Cline Melissa, Hughey Richard, Karplus Kevin
Center for Biomolecular Science and Engineering, Jack Baskin School of Engineering, University of California, Santa Cruz, CA 95064, USA.
Bioinformatics. 2002 Feb;18(2):306-14. doi: 10.1093/bioinformatics/18.2.306.
Protein sequence alignments have a myriad of applications in bioinformatics, including secondary and tertiary structure prediction, homology modeling, and phylogeny. Unfortunately, all alignment methods make mistakes, and mistakes in alignments often yield mistakes in their application. Thus, a method to identify and remove suspect alignment positions could benefit many areas in protein sequence analysis.
We tested four predictors of alignment position reliability, including near-optimal alignment information, column score, and secondary structural information. We validated each predictor against a large library of alignments, removing positions predicted as unreliable. Near-optimal alignment information was the best predictor, removing 70% of the substantially-misaligned positions and 58% of the over-aligned positions, while retaining 86% of those aligned accurately.
蛋白质序列比对在生物信息学中有众多应用,包括二级和三级结构预测、同源建模以及系统发育分析。不幸的是,所有比对方法都会出错,而比对中的错误往往会在其应用中导致错误。因此,一种识别和去除可疑比对位置的方法可能会使蛋白质序列分析的许多领域受益。
我们测试了四种比对位置可靠性的预测器,包括近最优比对信息、列得分和二级结构信息。我们针对一个大型比对库验证了每个预测器,去除预测为不可靠的位置。近最优比对信息是最佳预测器,去除了70%的严重错配位置和58%的过度比对位置,同时保留了86%准确比对的位置。