蛋白质三级结构预测中的主成分分析。

Principal component analysis in protein tertiary structure prediction.

作者信息

Álvarez Óscar, Fernández-Martínez Juan Luis, Fernández-Brillet Celia, Cernea Ana, Fernández-Muñiz Zulima, Kloczkowski Andrzej

机构信息

* Group of Inverse Problems, Optimization and Machine Learning, Department of Mathematics, University of Oviedo, C. Federico García Lorca, 18, 33007 Oviedo, Spain.

† Batelle Center for Mathematical Medicine, Nationwide Children's Hospital, Columbus, OH, USA.

出版信息

J Bioinform Comput Biol. 2018 Apr;16(2):1850005. doi: 10.1142/S0219720018500051. Epub 2018 Feb 22.

DOI:10.1142/S0219720018500051

PMID:29566640

Abstract

We discuss applicability of principal component analysis (PCA) for protein tertiary structure prediction from amino acid sequence. The algorithm presented in this paper belongs to the category of protein refinement models and involves establishing a low-dimensional space where the sampling (and optimization) is carried out via particle swarm optimizer (PSO). The reduced space is found via PCA performed for a set of low-energy protein models previously found using different optimization techniques. A high frequency term is added into this expansion by projecting the best decoy into the PCA basis set and calculating the residual model. This term is aimed at providing high frequency details in the energy optimization. The goal of this research is to analyze how the dimensionality reduction affects the prediction capability of the PSO procedure. For that purpose, different proteins from the Critical Assessment of Techniques for Protein Structure Prediction experiments were modeled. In all the cases, both the energy of the best decoy and the distance to the native structure have decreased. Our analysis also shows how the predicted backbone structure of native conformation and of alternative low energy states varies with respect to the PCA dimensionality. Generally speaking, the reconstruction can be successfully achieved with 10 principal components and the high frequency term. We also provide a computational analysis of protein energy landscape for the inverse problem of reconstructing structure from the reduced number of principal components, showing that the dimensionality reduction alleviates the ill-posed character of this high-dimensional energy optimization problem. The procedure explained in this paper is very fast and allows testing different PCA expansions. Our results show that PSO improves the energy of the best decoy used in the PCA when the adequate number of PCA terms is considered.

摘要

我们讨论主成分分析（PCA）在从氨基酸序列预测蛋白质三级结构方面的适用性。本文提出的算法属于蛋白质优化模型类别，涉及建立一个低维空间，在该空间中通过粒子群优化器（PSO）进行采样（和优化）。通过对先前使用不同优化技术找到的一组低能量蛋白质模型执行PCA来找到降维空间。通过将最佳诱饵投影到PCA基集中并计算残差模型，在该展开式中添加一个高频项。该项旨在在能量优化中提供高频细节。本研究的目标是分析降维如何影响PSO程序的预测能力。为此，对蛋白质结构预测技术关键评估实验中的不同蛋白质进行了建模。在所有情况下，最佳诱饵的能量和与天然结构的距离均有所降低。我们的分析还表明，天然构象和替代低能量状态的预测主链结构如何随PCA维度变化。一般来说，使用10个主成分和高频项可以成功实现重建。我们还对从减少的主成分数量重建结构的反问题进行了蛋白质能量景观的计算分析，表明降维减轻了这个高维能量优化问题的不适定性。本文解释的程序非常快，并且允许测试不同的PCA展开。我们的结果表明，当考虑足够数量的PCA项时，PSO会提高PCA中使用的最佳诱饵的能量。