Suppr超能文献

UQlust:将轮廓哈希与线性时间排序相结合,用于对大型大分子数据进行高效聚类和分析。

UQlust: combining profile hashing with linear-time ranking for efficient clustering and analysis of big macromolecular data.

作者信息

Adamczak Rafal, Meller Jarek

机构信息

Department of Informatics, Faculty of Physics, Astronomy and Informatics, Nicolaus Copernicus University, Grudziadzka 5, 87-100, Torun, Poland.

Departments of Environmental Health and Electrical Engineering & Computing Systems, University of Cincinnati, Cincinnati, USA.

出版信息

BMC Bioinformatics. 2016 Dec 28;17(1):546. doi: 10.1186/s12859-016-1381-2.

Abstract

BACKGROUND

Advances in computing have enabled current protein and RNA structure prediction and molecular simulation methods to dramatically increase their sampling of conformational spaces. The quickly growing number of experimentally resolved structures, and databases such as the Protein Data Bank, also implies large scale structural similarity analyses to retrieve and classify macromolecular data. Consequently, the computational cost of structure comparison and clustering for large sets of macromolecular structures has become a bottleneck that necessitates further algorithmic improvements and development of efficient software solutions.

RESULTS

uQlust is a versatile and easy-to-use tool for ultrafast ranking and clustering of macromolecular structures. uQlust makes use of structural profiles of proteins and nucleic acids, while combining a linear-time algorithm for implicit comparison of all pairs of models with profile hashing to enable efficient clustering of large data sets with a low memory footprint. In addition to ranking and clustering of large sets of models of the same protein or RNA molecule, uQlust can also be used in conjunction with fragment-based profiles in order to cluster structures of arbitrary length. For example, hierarchical clustering of the entire PDB using profile hashing can be performed on a typical laptop, thus opening an avenue for structural explorations previously limited to dedicated resources. The uQlust package is freely available under the GNU General Public License at https://github.com/uQlust .

CONCLUSION

uQlust represents a drastic reduction in the computational complexity and memory requirements with respect to existing clustering and model quality assessment methods for macromolecular structure analysis, while yielding results on par with traditional approaches for both proteins and RNAs.

摘要

背景

计算技术的进步使当前的蛋白质和RNA结构预测以及分子模拟方法能够极大地增加其对构象空间的采样。实验解析结构数量的快速增长,以及诸如蛋白质数据库等数据库,也意味着需要进行大规模的结构相似性分析以检索和分类大分子数据。因此,对大量大分子结构进行结构比较和聚类的计算成本已成为一个瓶颈,这就需要进一步改进算法并开发高效的软件解决方案。

结果

uQlust是一种用于对大分子结构进行超快速排序和聚类的通用且易于使用的工具。uQlust利用蛋白质和核酸的结构概况,同时将一种用于隐式比较所有模型对与概况哈希的线性时间算法相结合,以实现对大数据集的高效聚类,且内存占用低。除了对同一蛋白质或RNA分子的大量模型进行排序和聚类外,uQlust还可与基于片段的概况结合使用,以便对任意长度的结构进行聚类。例如,使用概况哈希对整个蛋白质数据库进行层次聚类可以在一台典型的笔记本电脑上完成,从而为以前仅限于专用资源的结构探索开辟了一条途径。uQlust软件包可在GNU通用公共许可证下从https://github.com/uQlust免费获取。

结论

相对于现有的大分子结构分析聚类和模型质量评估方法,uQlust在计算复杂度和内存需求方面有了大幅降低,同时在蛋白质和RNA方面产生的结果与传统方法相当。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a1fa/5198500/fe20927e48fa/12859_2016_1381_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验