De-Alarcón Pedro A, Pascual-Montano Alberto, Gupta Amarnath, Carazo Jose M
Biocomputing Unit, Centro Nacional de Biotecnologia (CSIC), Campus UAM, Cantoblanco, 28049 Madrid, Spain.
Biophys J. 2002 Aug;83(2):619-32. doi: 10.1016/S0006-3495(02)75196-5.
In the present work we develop an efficient way of representing the geometry and topology of volumetric datasets of biological structures from medium to low resolution, aiming at storing and querying them in a database framework. We make use of a new vector quantization algorithm to select the points within the macromolecule that best approximate the probability density function of the original volume data. Connectivity among points is obtained with the use of the alpha shapes theory. This novel data representation has a number of interesting characteristics, such as 1) it allows us to automatically segment and quantify a number of important structural features from low-resolution maps, such as cavities and channels, opening the possibility of querying large collections of maps on the basis of these quantitative structural features; 2) it provides a compact representation in terms of size; 3) it contains a subset of three-dimensional points that optimally quantify the densities of medium resolution data; and 4) a general model of the geometry and topology of the macromolecule (as opposite to a spatially unrelated bunch of voxels) is easily obtained by the use of the alpha shapes theory.
在本工作中,我们开发了一种高效的方法来表示中低分辨率生物结构体积数据集的几何形状和拓扑结构,旨在将它们存储在数据库框架中并进行查询。我们利用一种新的矢量量化算法来选择大分子内最能近似原始体积数据概率密度函数的点。通过使用α形状理论获得点之间的连通性。这种新颖的数据表示具有许多有趣的特性,例如:1)它使我们能够从低分辨率图谱中自动分割和量化许多重要的结构特征,如腔和通道,从而有可能基于这些定量结构特征查询大量图谱集合;2)它在大小方面提供了紧凑的表示;3)它包含三维点的一个子集,可最佳地量化中分辨率数据的密度;4)通过使用α形状理论,可以轻松获得大分子几何形状和拓扑结构的通用模型(与空间上不相关的一堆体素相反)。