Department of Biochemistry, Brandeis University, MS009, Waltham, MA 02454, USA.
Bioinformatics. 2012 Aug 1;28(15):1972-9. doi: 10.1093/bioinformatics/bts243. Epub 2012 Apr 27.
Superpositioning is an essential technique in structural biology that facilitates the comparison and analysis of conformational differences among topologically similar structures. Performing a superposition requires a one-to-one correspondence, or alignment, of the point sets in the different structures. However, in practice, some points are usually 'missing' from several structures, for example, when the alignment contains gaps. Current superposition methods deal with missing data simply by superpositioning a subset of points that are shared among all the structures. This practice is inefficient, as it ignores important data, and it fails to satisfy the common least-squares criterion. In the extreme, disregarding missing positions prohibits the calculation of a superposition altogether.
Here, we present a general solution for determining an optimal superposition when some of the data are missing. We use the expectation-maximization algorithm, a classic statistical technique for dealing with incomplete data, to find both maximum-likelihood solutions and the optimal least-squares solution as a special case.
The methods presented here are implemented in THESEUS 2.0, a program for superpositioning macromolecular structures. ANSI C source code and selected compiled binaries for various computing platforms are freely available under the GNU open source license from http://www.theseus3d.org.
Supplementary data are available at Bioinformatics online.
叠加是结构生物学中的一项基本技术,它有助于比较和分析拓扑相似结构之间的构象差异。执行叠加需要在不同结构中的点集之间建立一一对应的关系,即对齐。然而,在实践中,一些点通常会从几个结构中“缺失”,例如,当对齐包含间隙时。当前的叠加方法简单地通过叠加所有结构共有的点的子集来处理缺失数据。这种做法效率低下,因为它忽略了重要的数据,并且不符合常见的最小二乘准则。在极端情况下,忽略缺失的位置会完全禁止进行叠加计算。
在这里,我们提出了一种当部分数据缺失时确定最佳叠加的通用解决方案。我们使用期望最大化算法,这是一种用于处理不完整数据的经典统计技术,以找到最大似然解和最优最小二乘解作为特例。
这里介绍的方法已在 THESEUS 2.0 中实现,这是一个用于大分子结构叠加的程序。ANSI C 源代码和各种计算平台的选定编译二进制文件可根据 GNU 开源许可证从 http://www.theseus3d.org 免费获得。
补充数据可在 Bioinformatics 在线获取。