Department of Bioinformatics and Telemedicine, Faculty of Medicine, Jagiellonian University Medical College, Medyczna 7, 30-688 Kraków, Poland.
Biomolecules. 2023 Feb 17;13(2):385. doi: 10.3390/biom13020385.
In this paper, we present an update to the ellipsoid profile algorithm (EP), a simple technique for the measurement of the globularity of protein structures without the calculation of molecular surfaces. The globularity property is understood in this context as the ability of the molecule to fill a minimum volume enclosing ellipsoid (MVEE) that approximates its assumed globular shape. The more of the interior of this ellipsoid is occupied by the atoms of the protein, the better are its globularity metrics. These metrics are derived from the comparison of the volume of the voxelized representation of the atoms and the volume of all voxels that can fit inside that ellipsoid (a uniform unit Å cube lattice). The so-called ellipsoid profile shows how the globularity changes with the distance from the center. Two of its values, the so-called ellipsoid indexes, are used to classify the structure as globular, semi-globular or non-globular. Here, we enhance the workflow of the EP algorithm via an improved outlier detection subroutine based on principal component analysis. It is capable of robust distinguishing between the dense parts of the molecules and, for example, disordered chain fragments fully exposed to the solvent. The PCA-based method replaces the current approach based on kernel density estimation. The improved EP algorithm was tested on 2124 representatives of domain superfamilies from SCOP 2.08. The second part of this work is dedicated to the survey of globularity of 3594 representatives of biological assemblies from molecules currently deposited in the PDB and analyzed by the 3DComplex database (monomers and complexes up to 60 chains).
在本文中,我们对椭球轮廓算法(EP)进行了更新,这是一种无需计算分子表面即可测量蛋白质结构球形度的简单技术。在这种情况下,球形度特性被理解为分子填充最小体积包络椭球(MVEE)的能力,该椭球近似于其假定的球形形状。该椭球内部被蛋白质原子占据的部分越多,其球形度度量越好。这些度量是通过比较原子的体素化表示的体积与可以装入该椭球内部的所有体素的体积(均匀单位Å 立方晶格)得出的。所谓的椭球轮廓显示了球形度随距离中心的变化情况。它的两个值,即所谓的椭球指数,用于将结构分类为球形、半球形或非球形。在这里,我们通过基于主成分分析的改进异常值检测子程序增强了 EP 算法的工作流程。它能够稳健地区分分子的密集部分和完全暴露于溶剂的无序链片段。基于 PCA 的方法取代了当前基于核密度估计的方法。改进的 EP 算法在 SCOP 2.08 中的 2124 个结构域超家族代表上进行了测试。本文的第二部分致力于调查目前存储在 PDB 中的 3594 个生物组装代表的球形度,并由 3DComplex 数据库(单体和复合物最多 60 个链)进行分析。