Duan Mojie, Li Minghai, Han Li, Huo Shuanghong
Gustaf H. Carlson School of Chemistry and Biochemistry, Clark University, Worcester, Massachusetts, 01610.
Proteins. 2014 Oct;82(10):2585-96. doi: 10.1002/prot.24622. Epub 2014 Jun 19.
Dimensionality reduction is widely used in searching for the intrinsic reaction coordinates for protein conformational changes. We find the dimensionality-reduction methods using the pairwise root-mean-square deviation (RMSD) as the local distance metric face a challenge. We use Isomap as an example to illustrate the problem. We believe that there is an implied assumption for the dimensionality-reduction approaches that aim to preserve the geometric relations between the objects: both the original space and the reduced space have the same kind of geometry, such as Euclidean geometry vs. Euclidean geometry or spherical geometry vs. spherical geometry. When the protein free energy landscape is mapped onto a 2D plane or 3D space, the reduced space is Euclidean, thus the original space should also be Euclidean. For a protein with N atoms, its conformation space is a subset of the 3N-dimensional Euclidean space R(3N). We formally define the protein conformation space as the quotient space of R(3N) by the equivalence relation of rigid motions. Whether the quotient space is Euclidean or not depends on how it is parameterized. When the pairwise RMSD is employed as the local distance metric, implicit representations are used for the protein conformation space, leading to no direct correspondence to a Euclidean set. We have demonstrated that an explicit Euclidean-based representation of protein conformation space and the local distance metric associated to it improve the quality of dimensionality reduction in the tetra-peptide and β-hairpin systems.
降维在寻找蛋白质构象变化的内在反应坐标方面被广泛应用。我们发现,使用成对均方根偏差(RMSD)作为局部距离度量的降维方法面临一个挑战。我们以等距映射(Isomap)为例来说明这个问题。我们认为,对于旨在保留对象之间几何关系的降维方法存在一个隐含假设:原始空间和降维空间具有相同类型的几何结构,例如欧几里得几何对欧几里得几何或球面几何对球面几何。当蛋白质自由能景观被映射到二维平面或三维空间时,降维空间是欧几里得空间,因此原始空间也应该是欧几里得空间。对于一个具有N个原子的蛋白质,其构象空间是3N维欧几里得空间R(3N)的一个子集。我们正式将蛋白质构象空间定义为R(3N)通过刚体运动等价关系的商空间。商空间是否为欧几里得空间取决于它的参数化方式。当使用成对RMSD作为局部距离度量时,蛋白质构象空间使用隐式表示,这导致与欧几里得集没有直接对应关系。我们已经证明,基于显式欧几里得的蛋白质构象空间表示及其相关的局部距离度量提高了四肽和β - 发夹系统中降维的质量。