Ioannou Alexis, Fokianos Konstantinos, Promponas Vasilis J
Department of Mathematics & Statistics, University of Cyprus, Nicosia, Cyprus.
Biosystems. 2010 May;100(2):132-43. doi: 10.1016/j.biosystems.2010.02.008. Epub 2010 Mar 4.
We compare several spectral domain based clustering methods for partitioning protein sequence data. The main instrument for this exercise is the spectral density ratio model, which specifies that the logarithmic ratio of two or more unknown spectral density functions has a parametric linear combination of cosines. Maximum likelihood inference is worked out in detail and it is shown that its output yields several distance measures among independent stationary time series. These similarity indices are suitable for clustering time series data based on their second order properties. Other spectral domain based distances are investigated as well; and we compare all methods and distances to the problem of producing segmentations of bacterial outer membrane proteins consistent with their transmembrane topology. Protein sequences are transformed to time series data by employing numerical scales of physicochemical parameters. We also present interesting results on the prediction of transmembrane beta-strands, based on the clustering outcome, for a representative set of bacterial outer membrane proteins with given three-dimensional structure.
我们比较了几种基于谱域的聚类方法,用于对蛋白质序列数据进行划分。此项研究的主要工具是谱密度比模型,该模型规定两个或多个未知谱密度函数的对数比具有余弦的参数线性组合。详细推导了最大似然推断,并表明其输出产生了独立平稳时间序列之间的几种距离度量。这些相似性指标适用于基于时间序列数据的二阶特性对其进行聚类。还研究了其他基于谱域的距离;并且我们将所有方法和距离与产生与细菌外膜蛋白跨膜拓扑结构一致的分割问题进行比较。通过采用物理化学参数的数值尺度,将蛋白质序列转换为时间序列数据。对于一组具有给定三维结构的代表性细菌外膜蛋白,我们还基于聚类结果给出了关于跨膜β链预测的有趣结果。