Can Tolga, Wang Yuan-Fang
Department of Computer Science, University of California at Santa Barbara, 93106, USA.
Proc IEEE Comput Soc Bioinform Conf. 2003;2:169-79.
We present a new method for conducting protein structure similarity searches, which improves on the accuracy, robustness, and efficiency of some existing techniques. Our method is grounded in the theory of differential geometry on 3D space curve matching. We generate shape signatures for proteins that are invariant, localized, robust, compact, and biologically meaningful. To improve matching accuracy, we smooth the noisy raw atomic coordinate data with spline fitting. To improve matching efficiency, we adopt a hierarchical coarse-to-fine strategy. We use an efficient hashing-based technique to screen out unlikely candidates and perform detailed pairwise alignments only for a small number of candidates that survive the screening process. Contrary to other hashing based techniques, our technique employs domain specific information (not just geometric information) in constructing the hash key, and hence, is more tuned to the domain of biology. Furthermore, the invariancy, localization, and compactness of the shape signatures allow us to utilize a well-known local sequence alignment algorithm for aligning two protein structures. One measure of the efficacy of the proposed technique is that we were able to discover new, meaningful motifs that were not reported by other structure alignment methods.
我们提出了一种进行蛋白质结构相似性搜索的新方法,该方法在准确性、鲁棒性和效率方面对一些现有技术进行了改进。我们的方法基于三维空间曲线匹配的微分几何理论。我们为蛋白质生成形状特征,这些特征具有不变性、局部性、鲁棒性、紧凑性且具有生物学意义。为了提高匹配准确性,我们用样条拟合对有噪声的原始原子坐标数据进行平滑处理。为了提高匹配效率,我们采用分层的由粗到精策略。我们使用一种基于高效哈希的技术筛选出不太可能的候选对象,并且仅对在筛选过程中幸存的少数候选对象进行详细的成对比对。与其他基于哈希的技术不同,我们的技术在构建哈希键时采用特定领域信息(不仅仅是几何信息),因此,更适合生物学领域。此外,形状特征的不变性、局部性和紧凑性使我们能够利用一种著名的局部序列比对算法来比对两个蛋白质结构。所提出技术有效性的一个衡量标准是,我们能够发现其他结构比对方法未报告的新的、有意义的基序。