Suppr超能文献

通过比对循环实现大型化学数据库的快速三维形状筛选

Fast 3D shape screening of large chemical databases through alignment-recycling.

作者信息

Fontaine Fabien, Bolton Evan, Borodina Yulia, Bryant Stephen H

机构信息

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD 20894, USA.

出版信息

Chem Cent J. 2007 Jun 6;1:12. doi: 10.1186/1752-153X-1-12.

Abstract

BACKGROUND

Large chemical databases require fast, efficient, and simple ways of looking for similar structures. Although such tasks are now fairly well resolved for graph-based similarity queries, they remain an issue for 3D approaches, particularly for those based on 3D shape overlays. Inspired by a recent technique developed to compare molecular shapes, we designed a hybrid methodology, alignment-recycling, that enables efficient retrieval and alignment of structures with similar 3D shapes.

RESULTS

Using a dataset of more than one million PubChem compounds of limited size (< 28 heavy atoms) and flexibility (< 6 rotatable bonds), we obtained a set of a few thousand diverse structures covering entirely the 3D shape space of the conformers of the dataset. Transformation matrices gathered from the overlays between these diverse structures and the 3D conformer dataset allowed us to drastically (100-fold) reduce the CPU time required for shape overlay. The alignment-recycling heuristic produces results consistent with de novo alignment calculation, with better than 80% hit list overlap on average.

CONCLUSION

Overlay-based 3D methods are computationally demanding when searching large databases. Alignment-recycling reduces the CPU time to perform shape similarity searches by breaking the alignment problem into three steps: selection of diverse shapes to describe the database shape-space; overlay of the database conformers to the diverse shapes; and non-optimized overlay of query and database conformers using common reference shapes. The precomputation, required by the first two steps, is a significant cost of the method; however, once performed, querying is two orders of magnitude faster. Extensions and variations of this methodology, for example, to handle more flexible and larger small-molecules are discussed.

摘要

背景

大型化学数据库需要快速、高效且简单的方法来查找相似结构。尽管基于图形的相似性查询这类任务如今已得到较好解决,但对于三维方法而言,它们仍然是个问题,尤其是对于那些基于三维形状叠加的方法。受最近开发的一种用于比较分子形状的技术启发,我们设计了一种混合方法——比对循环法,该方法能够高效检索和比对具有相似三维形状的结构。

结果

使用一个包含超过一百万个PubChem化合物的数据集,这些化合物尺寸有限(<28个重原子)且柔性较低(<6个可旋转键),我们获得了几千个不同的结构,这些结构完全覆盖了数据集中构象异构体的三维形状空间。从这些不同结构与三维构象异构体数据集之间的叠加中收集的变换矩阵,使我们能够大幅(100倍)减少形状叠加所需的CPU时间。比对循环启发式算法产生的结果与从头比对计算一致,平均命中列表重叠率超过80%。

结论

在搜索大型数据库时,基于叠加的三维方法计算量很大。比对循环法通过将比对问题分解为三个步骤来减少执行形状相似性搜索所需的CPU时间:选择不同形状以描述数据库形状空间;将数据库构象异构体与不同形状进行叠加;以及使用共同参考形状对查询和数据库构象异构体进行非优化叠加。前两个步骤所需的预计算是该方法的一项重大成本;然而,一旦完成预计算,查询速度会快两个数量级。本文还讨论了该方法的扩展和变体,例如用于处理更具柔性和更大的小分子。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/18dc/1994057/10e7120ef86d/1752-153X-1-12-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验