David Charles C, Singam Ettayapuram Ramaprasad Azhagiya, Jacobs Donald J
Department of Bioinformatics and Genomics, University of North Carolina, Charlotte, USA.
Current Address: The New Zealand Institute for Plant & Food Research, Limited, Lincoln, New Zealand.
BMC Bioinformatics. 2017 May 25;18(1):271. doi: 10.1186/s12859-017-1676-y.
Essential Dynamics (ED) is a common application of principal component analysis (PCA) to extract biologically relevant motions from atomic trajectories of proteins. Covariance and correlation based PCA are two common approaches to determine PCA modes (eigenvectors) and their eigenvalues. Protein dynamics can be characterized in terms of Cartesian coordinates or internal distance pairs. In understanding protein dynamics, a comparison of trajectories taken from a set of proteins for similarity assessment provides insight into conserved mechanisms. Comprehensive software is needed to facilitate comparative-analysis with user-friendly features that are rooted in best practices from multivariate statistics.
We developed a Java based Essential Dynamics toolkit called JED to compare the ED from multiple protein trajectories. Trajectories from different simulations and different proteins can be pooled for comparative studies. JED implements Cartesian-based coordinates (cPCA) and internal distance pair coordinates (dpPCA) as options to construct covariance (Q) or correlation (R) matrices. Statistical methods are implemented for treating outliers, benchmarking sampling adequacy, characterizing the precision of Q and R, and reporting partial correlations. JED output results as text files that include transformed coordinates for aligned structures, several metrics that quantify protein mobility, PCA modes with their eigenvalues, and displacement vector (DV) projections onto the top principal modes. Pymol scripts together with PDB files allow movies of individual Q- and R-cPCA modes to be visualized, and the essential dynamics occurring within user-selected time scales. Subspaces defined by the top eigenvectors are compared using several statistical metrics to quantify similarity/overlap of high dimensional vector spaces. Free energy landscapes can be generated for both cPCA and dpPCA.
JED offers a convenient toolkit that encourages best practices in applying multivariate statistics methods to perform comparative studies of essential dynamics over multiple proteins. For each protein, Cartesian coordinates or internal distance pairs can be employed over the entire structure or user-selected parts to quantify similarity/differences in mobility and correlations in dynamics to develop insight into protein structure/function relationships.
主成分分析(PCA)的一个常见应用是基本动力学(ED),用于从蛋白质的原子轨迹中提取生物学相关的运动。基于协方差和相关性的PCA是确定PCA模式(特征向量)及其特征值的两种常见方法。蛋白质动力学可以用笛卡尔坐标或内部距离对来表征。在理解蛋白质动力学时,对一组蛋白质的轨迹进行相似性评估比较,有助于深入了解保守机制。需要综合软件来促进比较分析,其用户友好的功能应基于多元统计的最佳实践。
我们开发了一个基于Java的基本动力学工具包JED,用于比较多个蛋白质轨迹的ED。来自不同模拟和不同蛋白质的轨迹可以汇总用于比较研究。JED实现了基于笛卡尔坐标(cPCA)和内部距离对坐标(dpPCA),作为构建协方差(Q)或相关性(R)矩阵的选项。实现了用于处理异常值、评估抽样充分性、表征Q和R的精度以及报告偏相关性的统计方法。JED将结果输出为文本文件,包括对齐结构的变换坐标、量化蛋白质流动性的几个指标、具有其特征值的PCA模式以及在主要主模式上的位移向量(DV)投影。Pymol脚本与PDB文件一起可用于可视化单个Q和R-cPCA模式的动画,以及在用户选择的时间尺度内发生的基本动力学。使用几个统计指标比较由顶级特征向量定义的子空间,以量化高维向量空间的相似性/重叠。可以为cPCA和dpPCA生成自由能景观。
JED提供了一个方便的工具包,鼓励在应用多元统计方法对多个蛋白质进行基本动力学比较研究时采用最佳实践。对于每个蛋白质,可以在整个结构或用户选择的部分上使用笛卡尔坐标或内部距离对,以量化流动性的相似性/差异以及动力学中的相关性,从而深入了解蛋白质结构/功能关系。