• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

开源的支架识别和命名系统(SCINS)实现及应用示例。

An Open-Source Implementation of the Scaffold Identification and Naming System (SCINS) and Example Applications.

机构信息

Pangea Bio, Pangea Botanica GmbH, Hardenbergstrasse 32, 10623 Berlin, Germany.

Centre for Molecular Informatics, Yusuf Hamied Department of Chemistry, University of Cambridge, Lensfield Rd, CB2 1EW Cambridge, United Kingdom.

出版信息

J Chem Inf Model. 2024 Oct 28;64(20):7905-7916. doi: 10.1021/acs.jcim.4c01314. Epub 2024 Oct 15.

DOI:10.1021/acs.jcim.4c01314
PMID:39404472
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11523071/
Abstract

Organizing and partitioning sets of chemical structures is of considerable practical significance, e.g., in compound library analysis and the postprocessing of screening hit lists. Approaches such as unsupervised clustering are computationally demanding and dataset-dependent; on the other hand, rule-based methods, such as those based on Murcko scaffolds, have linear time complexity but are often too fine-grained, leading to a large number of singletons or sparsely populated classes. An alternative rule-based method that seeks to achieve an optimal balance when grouping compounds into sets is the 'Scaffold Identification and Naming System' (SCINS). To facilitate public use of this previously published method, here, we provide an open-source Python implementation of SCINS, dependent only on RDKit. We show that SCINS can be useful in identifying sparsely and densely populated regions in chemical space in large databases, here exemplified with Enamine REAL Diverse and ChEMBL. We find that Enamine REAL Diverse covers a much smaller SCINS space relative to ChEMBL, whereas the opposite is true when Murcko and generic Murcko scaffolds are considered. Additionally, we show that SCINS can result in chemically intuitive grouping of medium-sized sets of bioactive compounds, which can be useful in compound selection from virtual screening campaigns as well as postprocessing of experimental hit lists. Hence, in this work, we provide both an open-source implementation of SCINS and its characterization with relevant use cases.

摘要

组织和划分化学结构集具有相当大的实际意义,例如在化合物库分析和筛选命中列表的后处理中。无监督聚类等方法计算量很大且依赖于数据集;另一方面,基于规则的方法,如基于 Murcko 支架的方法,具有线性时间复杂度,但通常过于精细,导致大量单例或稀疏填充的类。一种替代的基于规则的方法,旨在在将化合物分组到集合中时实现最佳平衡,是“支架识别和命名系统”(SCINS)。为了促进对这个先前发布的方法的公共使用,在这里,我们提供了一个仅依赖 RDKit 的 SCINS 的开源 Python 实现。我们表明,SCINS 可用于识别大型数据库中化学空间中稀疏和密集的区域,这里以 Enamine REAL Diverse 和 ChEMBL 为例。我们发现,相对于 ChEMBL,Enamine REAL Diverse 覆盖的 SCINS 空间要小得多,而当考虑 Murcko 和通用 Murcko 支架时则相反。此外,我们表明,SCINS 可以对中等大小的生物活性化合物集进行化学上直观的分组,这在虚拟筛选活动中的化合物选择以及实验命中列表的后处理中非常有用。因此,在这项工作中,我们提供了 SCINS 的开源实现及其相关用例的特征描述。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ca49/11523071/312b5d9a5ccd/ci4c01314_0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ca49/11523071/6e5daea3fef0/ci4c01314_0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ca49/11523071/38b5a5126c4d/ci4c01314_0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ca49/11523071/5efa202b8cf2/ci4c01314_0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ca49/11523071/fe2700d2040f/ci4c01314_0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ca49/11523071/e65e20bc56ac/ci4c01314_0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ca49/11523071/28ec174fc763/ci4c01314_0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ca49/11523071/366ca2fda067/ci4c01314_0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ca49/11523071/55b9df9e14b3/ci4c01314_0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ca49/11523071/312b5d9a5ccd/ci4c01314_0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ca49/11523071/6e5daea3fef0/ci4c01314_0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ca49/11523071/38b5a5126c4d/ci4c01314_0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ca49/11523071/5efa202b8cf2/ci4c01314_0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ca49/11523071/fe2700d2040f/ci4c01314_0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ca49/11523071/e65e20bc56ac/ci4c01314_0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ca49/11523071/28ec174fc763/ci4c01314_0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ca49/11523071/366ca2fda067/ci4c01314_0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ca49/11523071/55b9df9e14b3/ci4c01314_0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ca49/11523071/312b5d9a5ccd/ci4c01314_0009.jpg

相似文献

1
An Open-Source Implementation of the Scaffold Identification and Naming System (SCINS) and Example Applications.开源的支架识别和命名系统(SCINS)实现及应用示例。
J Chem Inf Model. 2024 Oct 28;64(20):7905-7916. doi: 10.1021/acs.jcim.4c01314. Epub 2024 Oct 15.
2
Charting, navigating, and populating natural product chemical space for drug discovery.为药物发现绘制、导航和填充天然产物化学空间。
J Med Chem. 2012 Jul 12;55(13):5989-6001. doi: 10.1021/jm300288g. Epub 2012 May 11.
3
Scaffold diversity of exemplified medicinal chemistry space.体现药用化学空间的支架多样性。
J Chem Inf Model. 2011 Sep 26;51(9):2174-85. doi: 10.1021/ci2001428. Epub 2011 Aug 31.
4
How diverse are diversity assessment methods? A comparative analysis and benchmarking of molecular descriptor space.多样性评估方法有哪些差异?分子描述符空间的比较分析和基准测试。
J Chem Inf Model. 2014 Jan 27;54(1):230-42. doi: 10.1021/ci400469u. Epub 2013 Dec 13.
5
French dispatch: GTM-based analysis of the Chimiothèque Nationale Chemical Space.法国快讯:基于全局地形匹配的国家化学文库化学空间分析
Mol Inform. 2023 Apr;42(4):e2200208. doi: 10.1002/minf.202200208. Epub 2023 Feb 6.
6
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
7
ScaffoldGraph: an open-source library for the generation and analysis of molecular scaffold networks and scaffold trees.支架图:用于生成和分析分子支架网络和支架树的开源库。
Bioinformatics. 2020 Jun 1;36(12):3930-3931. doi: 10.1093/bioinformatics/btaa219.
8
A new ChEMBL dataset for the similarity-based target fishing engine FastTargetPred: Annotation of an exhaustive list of linear tetrapeptides.用于基于相似性的靶点筛选引擎FastTargetPred的新ChEMBL数据集:线性四肽详尽列表的注释
Data Brief. 2022 Apr 11;42:108159. doi: 10.1016/j.dib.2022.108159. eCollection 2022 Jun.
9
An open source chemical structure curation pipeline using RDKit.一个使用RDKit的开源化学结构编目流程。
J Cheminform. 2020 Sep 1;12(1):51. doi: 10.1186/s13321-020-00456-1.
10
PepFun: Open Source Protocols for Peptide-Related Computational Analysis.PepFun:用于肽相关计算分析的开源协议。
Molecules. 2021 Mar 16;26(6):1664. doi: 10.3390/molecules26061664.

本文引用的文献

1
Chemoinformatic approaches for navigating large chemical spaces.化学信息学方法在大型化学空间中的应用。
Expert Opin Drug Discov. 2024 Apr;19(4):403-414. doi: 10.1080/17460441.2024.2313475. Epub 2024 Feb 5.
2
The ChEMBL Database in 2023: a drug discovery platform spanning multiple bioactivity data types and time periods.2023 年的 ChEMBL 数据库:一个涵盖多种生物活性数据类型和时间段的药物发现平台。
Nucleic Acids Res. 2024 Jan 5;52(D1):D1180-D1192. doi: 10.1093/nar/gkad1004.
3
Molecular Framework Analysis of the Generated Database GDB-13s.
生成数据库 GDB-13s 的分子框架分析。
J Chem Inf Model. 2023 Jan 23;63(2):484-492. doi: 10.1021/acs.jcim.2c01107. Epub 2022 Dec 19.
4
PubChem 2023 update.PubChem 2023 更新。
Nucleic Acids Res. 2023 Jan 6;51(D1):D1373-D1380. doi: 10.1093/nar/gkac956.
5
One molecular fingerprint to rule them all: drugs, biomolecules, and the metabolome.一种分子指纹统御万物:药物、生物分子与代谢组。
J Cheminform. 2020 Jun 12;12(1):43. doi: 10.1186/s13321-020-00445-4.
6
Generating Multibillion Chemical Space of Readily Accessible Screening Compounds.生成易于获取的筛选化合物的数十亿化学空间。
iScience. 2020 Oct 15;23(11):101681. doi: 10.1016/j.isci.2020.101681. eCollection 2020 Nov 20.
7
ZINC20-A Free Ultralarge-Scale Chemical Database for Ligand Discovery.ZINC20-A 免费超大尺度化学数据库,用于配体发现。
J Chem Inf Model. 2020 Dec 28;60(12):6065-6073. doi: 10.1021/acs.jcim.0c00675. Epub 2020 Oct 29.
8
Automated Identification of Chemical Series: Classifying like a Medicinal Chemist.自动识别化学系列:像药物化学家那样分类。
J Chem Inf Model. 2020 Jun 22;60(6):2888-2902. doi: 10.1021/acs.jcim.0c00204. Epub 2020 Jun 4.
9
How do we optimize chemical space navigation?我们如何优化化学空间导航?
Expert Opin Drug Discov. 2020 May;15(5):523-525. doi: 10.1080/17460441.2020.1730324. Epub 2020 Feb 18.
10
SciPy 1.0: fundamental algorithms for scientific computing in Python.SciPy 1.0:Python 中的科学计算基础算法。
Nat Methods. 2020 Mar;17(3):261-272. doi: 10.1038/s41592-019-0686-2. Epub 2020 Feb 3.