Suppr超能文献

关于向量集最小二乘叠加的充分统计量。

On sufficient statistics of least-squares superposition of vector sets.

作者信息

Konagurthu Arun S, Kasarapu Parthan, Allison Lloyd, Collier James H, Lesk Arthur M

机构信息

1Clayton School of Computer Science and Information Technology, Faculty of Information Technology, Monash University, Clayton, Australia.

2The Huck Institute of Genomics, Proteomics and Bioinformatics, Pennsylvania State University, University Park, Pennsylvania.

出版信息

J Comput Biol. 2015 Jun;22(6):487-97. doi: 10.1089/cmb.2014.0154. Epub 2015 Feb 19.

Abstract

The problem of superposition of two corresponding vector sets by minimizing their sum-of-squares error under orthogonal transformation is a fundamental task in many areas of science, notably structural molecular biology. This problem can be solved exactly using an algorithm whose time complexity grows linearly with the number of correspondences. This efficient solution has facilitated the widespread use of the superposition task, particularly in studies involving macromolecular structures. This article formally derives a set of sufficient statistics for the least-squares superposition problem. These statistics are additive. This permits a highly efficient (constant time) computation of superpositions (and sufficient statistics) of vector sets that are composed from its constituent vector sets under addition or deletion operation, where the sufficient statistics of the constituent sets are already known (that is, the constituent vector sets have been previously superposed). This results in a drastic improvement in the run time of the methods that commonly superpose vector sets under addition or deletion operations, where previously these operations were carried out ab initio (ignoring the sufficient statistics). We experimentally demonstrate the improvement our work offers in the context of protein structural alignment programs that assemble a reliable structural alignment from well-fitting (substructural) fragment pairs. A C++ library for this task is available online under an open-source license.

摘要

在正交变换下通过最小化两个相应向量集的平方和误差来实现它们的叠加问题,是许多科学领域中的一项基本任务,在结构分子生物学领域尤为突出。这个问题可以通过一种算法精确求解,该算法的时间复杂度随对应关系的数量呈线性增长。这种高效的解决方案促进了叠加任务的广泛应用,特别是在涉及大分子结构的研究中。本文正式推导了一组用于最小二乘叠加问题的充分统计量。这些统计量是可加的。这允许对通过加法或删除操作由其组成向量集构成的向量集进行高效(常数时间)的叠加计算(以及充分统计量计算),前提是组成集的充分统计量已经已知(即组成向量集之前已经进行过叠加)。这使得在加法或删除操作下通常叠加向量集的方法的运行时间得到显著改善,以前这些操作是从头开始进行的(忽略充分统计量)。我们通过实验证明了我们的工作在蛋白质结构比对程序的背景下所带来的改进,这些程序从拟合良好的(子结构)片段对中组装出可靠的结构比对。一个用于此任务的C++库在开源许可下可在线获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验