White J V, Stultz C M, Smith T F
TASC, Reading, Massachusetts 01867.
Math Biosci. 1994 Jan;119(1):35-75. doi: 10.1016/0025-5564(94)90004-3.
The prediction of a protein's tertiary structural class from its amino-acid sequence is formulated as a signal-processing problem. The amino-acid sequence is treated as a "time series" of symbols containing signals that determine the protein's structural class. A methodology is described for building detailed stochastic signal models for recognized structural classes of single-domain proteins. We solve the problem of determining that model, from a set of candidates, which is the most probable generator of a protein's entire amino-acid sequence. The solution employs a nonlinear, optimal filtering algorithm, which is suited for implementation on parallel computer architectures. Previous approaches have only been able to classify correctly 80% of single-domain proteins within three very broad structural types, while our approach achieves this level across twelve much more detailed classes.
从蛋白质的氨基酸序列预测其三级结构类别被表述为一个信号处理问题。氨基酸序列被视为包含决定蛋白质结构类别的信号的符号“时间序列”。本文描述了一种为单结构域蛋白质的公认结构类别构建详细随机信号模型的方法。我们解决了从一组候选模型中确定哪个模型最有可能是蛋白质完整氨基酸序列生成器的问题。该解决方案采用了一种非线性最优滤波算法,适用于在并行计算机架构上实现。以前的方法只能在三种非常宽泛的结构类型中正确分类80%的单结构域蛋白质,而我们的方法在十二个更详细的类别中都达到了这一水平。