• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

NIPALSTREE:一种用于大型化合物库的新型层次聚类方法及其在虚拟筛选中的应用。

NIPALSTREE: a new hierarchical clustering approach for large compound libraries and its application to virtual screening.

作者信息

Böcker Alexander, Schneider Gisbert, Teckentrup Andreas

机构信息

Institut für Organische Chemie und Chemische Biologie, Johann Wolfgang Goethe-Universität, Marie-Curie-Strasse 11, D-60439 Frankfurt, Germany.

出版信息

J Chem Inf Model. 2006 Nov-Dec;46(6):2220-9. doi: 10.1021/ci050541d.

DOI:10.1021/ci050541d
PMID:17125166
Abstract

A hierarchical clustering algorithm--NIPALSTREE--was developed that is able to analyze large data sets in high-dimensional space. The result can be displayed as a dendrogram. At each tree level the algorithm projects a data set via principle component analysis onto one dimension. The data set is sorted according to this one dimension and split at the median position. To avoid distortion of clusters at the median position, the algorithm identifies a potentially more suited split point left or right of the median. The procedure is recursively applied on the resulting subsets until the maximal distance between cluster members exceeds a user-defined threshold. The approach was validated in a retrospective screening study for angiotensin converting enzyme (ACE) inhibitors. The resulting clusters were assessed for their purity and enrichment in actives belonging to this ligand class. Enrichment was observed in individual branches of the dendrogram. In further retrospective virtual screening studies employing the MDL Drug Data Report (MDDR), COBRA, and the SPECS catalog, NIPALSTREE was compared with the hierarchical k-means clustering approach. Results show that both algorithms can be used in the context of virtual screening. Intersecting the result lists obtained with both algorithms improved enrichment factors while losing only few chemotypes.

摘要

开发了一种层次聚类算法——NIPALSTREE,它能够分析高维空间中的大型数据集。结果可以显示为树形图。在每个树层级,该算法通过主成分分析将数据集投影到一个维度上。数据集根据这一维度进行排序,并在中位数位置进行分割。为避免在中位数位置出现聚类失真,该算法会在中位数的左侧或右侧识别一个可能更合适的分割点。该过程在所得子集中递归应用,直到聚类成员之间的最大距离超过用户定义的阈值。该方法在一项针对血管紧张素转换酶(ACE)抑制剂的回顾性筛选研究中得到验证。对所得聚类进行纯度评估,并评估属于该配体类别的活性物质的富集情况。在树形图的各个分支中观察到了富集现象。在进一步使用MDL药物数据报告(MDDR)、COBRA和SPECS目录的回顾性虚拟筛选研究中,将NIPALSTREE与层次k均值聚类方法进行了比较。结果表明,这两种算法都可用于虚拟筛选。将两种算法获得的结果列表相交,在仅损失少量化学型的情况下提高了富集因子。

相似文献

1
NIPALSTREE: a new hierarchical clustering approach for large compound libraries and its application to virtual screening.NIPALSTREE:一种用于大型化合物库的新型层次聚类方法及其在虚拟筛选中的应用。
J Chem Inf Model. 2006 Nov-Dec;46(6):2220-9. doi: 10.1021/ci050541d.
2
A hierarchical clustering approach for large compound libraries.一种用于大型化合物库的层次聚类方法。
J Chem Inf Model. 2005 Jul-Aug;45(4):807-15. doi: 10.1021/ci0500029.
3
A cluster-based strategy for assessing the overlap between large chemical libraries and its application to a recent acquisition.一种基于聚类的策略,用于评估大型化学文库之间的重叠及其在近期收购中的应用。
J Chem Inf Model. 2006 Nov-Dec;46(6):2651-60. doi: 10.1021/ci600219n.
4
Applications of self-organizing neural networks in virtual screening and diversity selection.自组织神经网络在虚拟筛选和多样性选择中的应用。
J Chem Inf Model. 2006 Nov-Dec;46(6):2319-23. doi: 10.1021/ci0600657.
5
A scalable approach to combinatorial library design for drug discovery.一种用于药物发现的组合文库设计的可扩展方法。
J Chem Inf Model. 2008 Jan;48(1):27-41. doi: 10.1021/ci700023y. Epub 2007 Dec 6.
6
Novel 2D fingerprints for ligand-based virtual screening.用于基于配体的虚拟筛选的新型二维指纹图谱。
J Chem Inf Model. 2006 Nov-Dec;46(6):2423-31. doi: 10.1021/ci060155b.
7
Learning from the data: mining of large high-throughput screening databases.从数据中学习:大型高通量筛选数据库挖掘
J Chem Inf Model. 2006 Nov-Dec;46(6):2381-95. doi: 10.1021/ci060102u.
8
Scaffold composition and biological relevance of screening libraries.筛选文库的支架组成及生物学相关性。
Nat Chem Biol. 2007 Aug;3(8):442-6. doi: 10.1038/nchembio0807-442.
9
An efficient in silico screening method based on the protein-compound affinity matrix and its application to the design of a focused library for cytochrome P450 (CYP) ligands.一种基于蛋白质-化合物亲和矩阵的高效计算机筛选方法及其在细胞色素P450(CYP)配体聚焦文库设计中的应用。
J Chem Inf Model. 2006 Nov-Dec;46(6):2610-22. doi: 10.1021/ci600334u.
10
Toward an improved clustering of large data sets using maximum common substructures and topological fingerprints.利用最大公共子结构和拓扑指纹改进大数据集的聚类
J Chem Inf Model. 2008 Nov;48(11):2097-107. doi: 10.1021/ci8000887.

引用本文的文献

1
CFam: a chemical families database based on iterative selection of functional seeds and seed-directed compound clustering.CFam:一个基于功能种子的迭代选择和种子导向化合物聚类的化学家族数据库。
Nucleic Acids Res. 2015 Jan;43(Database issue):D558-65. doi: 10.1093/nar/gku1212. Epub 2014 Nov 20.
2
Development and experimental test of support vector machines virtual screening method for searching Src inhibitors from large compound libraries.用于从大型化合物库中搜索Src抑制剂的支持向量机虚拟筛选方法的开发与实验测试
Chem Cent J. 2012 Nov 23;6(1):139. doi: 10.1186/1752-153X-6-139.
3
IVSPlat 1.0: an integrated virtual screening platform with a molecular graphical interface.
IVSPlat 1.0:一个带有分子图形界面的集成虚拟筛选平台。
Chem Cent J. 2012 Jan 5;6(1):2. doi: 10.1186/1752-153X-6-2.
4
TSCC: Two-Stage Combinatorial Clustering for virtual screening using protein-ligand interactions and physicochemical features.TSCC:基于蛋白质-配体相互作用和物理化学特征的虚拟筛选两阶段组合聚类。
BMC Genomics. 2010 Dec 2;11 Suppl 4(Suppl 4):S26. doi: 10.1186/1471-2164-11-S4-S26.
5
The development of a knowledge base for basic active structures: an example case of dopamine agonists.基础活性结构知识库的开发:以多巴胺激动剂为例
Chem Cent J. 2010 Jan 23;4(1):1. doi: 10.1186/1752-153X-4-1.