Suppr超能文献

SketchSort:快速全对相似性搜索大型分子指纹数据库。

SketchSort: Fast All Pairs Similarity Search for Large Databases of Molecular Fingerprints.

机构信息

Minato Discrete Structure Manipulation System Project, ERATO, Japan Science and Technology Agency, Sapporo, 060-0814, Japan.

Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, 135-0064, Japan.

出版信息

Mol Inform. 2011 Sep;30(9):801-7. doi: 10.1002/minf.201100050. Epub 2011 Jul 12.

Abstract

Similarity networks of ligands are often reported useful in predicting chemical activities and target proteins. However, the naive method of computing all pairwise similarities of chemical fingerprints takes quadratic time, which is prohibitive for large scale databases with millions of ligands. We propose a fast all pairs similarity search method, called SketchSort, that maps chemical fingerprints to symbol strings with random projections, and finds similar strings by multiple masked sorting. Due to random projection, SketchSort misses a certain fraction of neighbors (i.e., false negatives). Nevertheless, the expected fraction of false negatives is theoretically derived and can be kept under a very small value. Experiments show that SketchSort is much faster than other similarity search methods and enables us to obtain a PubChem-scale similarity network quickly.

摘要

配体的相似性网络通常被报道在预测化学活性和靶蛋白方面很有用。然而,计算化学指纹的所有成对相似性的简单方法需要二次时间,对于具有数百万配体的大规模数据库来说是不可行的。我们提出了一种快速的全对相似性搜索方法,称为 SketchSort,它将化学指纹映射到具有随机投影的符号串,并通过多次掩蔽排序找到相似的字符串。由于随机投影,SketchSort 会错过一定比例的邻居(即假阴性)。然而,假阴性的预期比例是从理论上推导出来的,可以保持在非常小的值。实验表明,SketchSort 比其他相似性搜索方法快得多,使我们能够快速获得 PubChem 规模的相似性网络。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验