Nagarajan Niranjan, Yona Golan
Department of Computer Science, Cornell University, Upson Hall, Ithaca, NY 14853, USA.
Bioinformatics. 2004 Jun 12;20(9):1335-60. doi: 10.1093/bioinformatics/bth086. Epub 2004 Feb 12.
We describe a novel method for detecting the domain structure of a protein from sequence information alone. The method is based on analyzing multiple sequence alignments that are derived from a database search. Multiple measures are defined to quantify the domain information content of each position along the sequence and are combined into a single predictor using a neural network. The output is further smoothed and post-processed using a probabilistic model to predict the most likely transition positions between domains.
The method was assessed using the domain definitions in SCOP and CATH for proteins of known structure and was compared with several other existing methods. Our method performs well both in terms of accuracy and sensitivity. It improves significantly over the best methods available, even some of the semi-manual ones, while being fully automatic. Our method can also be used to suggest and verify domain partitions based on structural data. A few examples of predicted domain definitions and alternative partitions, as suggested by our method, are also discussed.
An online domain-prediction server is available at http://biozon.org/tools/domains/
我们描述了一种仅从序列信息中检测蛋白质结构域结构的新方法。该方法基于分析从数据库搜索中获得的多序列比对。定义了多种度量来量化序列中每个位置的结构域信息含量,并使用神经网络将它们组合成一个单一的预测器。输出进一步通过概率模型进行平滑和后处理,以预测结构域之间最可能的过渡位置。
使用SCOP和CATH中已知结构蛋白质的结构域定义对该方法进行了评估,并与其他几种现有方法进行了比较。我们的方法在准确性和敏感性方面都表现良好。即使与一些半自动方法相比,它也比现有最佳方法有显著改进,同时它是完全自动化的。我们的方法还可用于基于结构数据建议和验证结构域划分。还讨论了我们的方法所建议的一些预测结构域定义和替代划分的示例。