• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

数据库指纹(DFP):一种表示分子数据库的方法。

Database fingerprint (DFP): an approach to represent molecular databases.

作者信息

Fernández-de Gortari Eli, García-Jacas César R, Martinez-Mayorga Karina, Medina-Franco José L

机构信息

Departamento de Farmacia, Facultad de Química, Universidad Nacional Autónoma de México, Avenida Universidad 3000, 04510 Mexico City, Mexico.

Instituto de Química, Universidad Nacional Autónoma de México, Avenida Universidad 3000, 04510 Mexico City, Mexico.

出版信息

J Cheminform. 2017 Feb 6;9:9. doi: 10.1186/s13321-017-0195-1. eCollection 2017.

DOI:10.1186/s13321-017-0195-1
PMID:28224019
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5293704/
Abstract

BACKGROUND

Molecular fingerprints are widely used in several areas of chemoinformatics including diversity analysis and similarity searching. The fingerprint-based analysis of chemical libraries, in particular of large collections, usually requires the molecular representation of each compound in the library that may lead to issues of storage space and redundant calculations. In fact, information redundancy is inherent to the data, resulting on binary digit positions in the fingerprint without significant information.

RESULTS

Herein is proposed a general approach to represent an entire compound library with a single binary fingerprint. The development of the database fingerprint (DFP) is illustrated first using a short fingerprint (MACCS keys) for 10 data sets of general interest in chemistry. The application of the DFP is further shown with PubChem fingerprints for the data sets used in the primary example but with a larger number of compounds, up to 25,000 molecules. The performance of DFP were studied through differential Shannon entropy, k-mean clustering, and DFP/Tanimoto similarity.

CONCLUSIONS

The DFP is designed to capture key information of the compound collection and can be used to compare and assess the diversity of molecular libraries. This Preliminary Communication shows the potential of the novel fingerprint to conduct inter-library relationships. A major future goal is to apply the DFP for virtual screening and developing DFP for other data sets based on several different type of fingerprints.Graphical AbstractDatabase fingerprint captures the key information of molecular databases to perform chemical space characterization and virtual screening.

摘要

背景

分子指纹在化学信息学的多个领域广泛应用,包括多样性分析和相似性搜索。基于指纹对化学库(尤其是大型库)进行分析时,通常需要库中每个化合物的分子表示形式,这可能会导致存储空间和冗余计算问题。实际上,信息冗余是数据固有的,导致指纹中的二进制数字位置没有重要信息。

结果

本文提出了一种用单个二进制指纹表示整个化合物库的通用方法。首先使用短指纹(MACCS键)对10个化学领域普遍关注的数据集说明数据库指纹(DFP)的开发。对于主要示例中使用的数据集,但化合物数量更多(多达25000个分子),进一步展示了DFP与PubChem指纹的应用。通过差分香农熵、k均值聚类和DFP/Tanimoto相似性研究了DFP的性能。

结论

DFP旨在捕获化合物集合的关键信息,可用于比较和评估分子库的多样性。本初步通讯展示了这种新型指纹在建立库间关系方面的潜力。未来的一个主要目标是将DFP应用于虚拟筛选,并基于几种不同类型的指纹为其他数据集开发DFP。

图形摘要

数据库指纹捕获分子数据库的关键信息以进行化学空间表征和虚拟筛选。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd19/5293704/7558cba10582/13321_2017_195_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd19/5293704/8a7182377ebb/13321_2017_195_Figa_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd19/5293704/babbf11a0a09/13321_2017_195_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd19/5293704/25b6bfd98287/13321_2017_195_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd19/5293704/902b6db82858/13321_2017_195_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd19/5293704/923cdd688ff7/13321_2017_195_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd19/5293704/c237503c214b/13321_2017_195_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd19/5293704/7558cba10582/13321_2017_195_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd19/5293704/8a7182377ebb/13321_2017_195_Figa_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd19/5293704/babbf11a0a09/13321_2017_195_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd19/5293704/25b6bfd98287/13321_2017_195_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd19/5293704/902b6db82858/13321_2017_195_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd19/5293704/923cdd688ff7/13321_2017_195_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd19/5293704/c237503c214b/13321_2017_195_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd19/5293704/7558cba10582/13321_2017_195_Fig6_HTML.jpg

相似文献

1
Database fingerprint (DFP): an approach to represent molecular databases.数据库指纹(DFP):一种表示分子数据库的方法。
J Cheminform. 2017 Feb 6;9:9. doi: 10.1186/s13321-017-0195-1. eCollection 2017.
2
Statistical-based database fingerprint: chemical space dependent representation of compound databases.基于统计的数据库指纹:化合物数据库的化学空间依赖性表示。
J Cheminform. 2018 Nov 22;10(1):55. doi: 10.1186/s13321-018-0311-x.
3
Shannon entropy-based fingerprint similarity search strategy.基于香农熵的指纹相似性搜索策略。
J Chem Inf Model. 2009 Jul;49(7):1687-91. doi: 10.1021/ci900159f.
4
Design and evaluation of a molecular fingerprint involving the transformation of property descriptor values into a binary classification scheme.一种涉及将性质描述符值转化为二元分类方案的分子指纹的设计与评估。
J Chem Inf Comput Sci. 2003 Jul-Aug;43(4):1151-7. doi: 10.1021/ci030285+.
5
Modeling Tanimoto Similarity Value Distributions and Predicting Search Results.模拟谷本相似度值分布并预测搜索结果。
Mol Inform. 2017 Jul;36(7). doi: 10.1002/minf.201600131. Epub 2016 Dec 29.
6
How do 2D fingerprints detect structurally diverse active compounds? Revealing compound subset-specific fingerprint features through systematic selection.2D 指纹如何检测结构多样的活性化合物?通过系统选择揭示化合物子集特异性指纹特征。
J Chem Inf Model. 2011 Sep 26;51(9):2254-65. doi: 10.1021/ci200275m. Epub 2011 Aug 8.
7
Profile scaling increases the similarity search performance of molecular fingerprints containing numerical descriptors and structural keys.轮廓缩放提高了包含数值描述符和结构键的分子指纹的相似性搜索性能。
J Chem Inf Comput Sci. 2003 Jul-Aug;43(4):1218-25. doi: 10.1021/ci030287u.
8
Introduction of a generally applicable method to estimate retrieval of active molecules for similarity searching using fingerprints.介绍一种使用指纹来估计活性分子检索以进行相似性搜索的通用方法。
ChemMedChem. 2007 Sep;2(9):1311-20. doi: 10.1002/cmdc.200700090.
9
Analysis of the effects of related fingerprints on molecular similarity using an eigenvalue entropy approach.使用特征值熵方法分析相关指纹对分子相似性的影响。
J Cheminform. 2021 Mar 23;13(1):27. doi: 10.1186/s13321-021-00506-2.
10
Anatomy of fingerprint search calculations on structurally diverse sets of active compounds.关于结构多样的活性化合物集的指纹搜索计算剖析。
J Chem Inf Model. 2005 Nov-Dec;45(6):1812-9. doi: 10.1021/ci050276w.

引用本文的文献

1
ElixirSeeker: A Machine Learning Framework Utilizing Fusion Molecular Fingerprints for the Discovery of Lifespan-Extending Compounds.长生不老药探索者:一种利用融合分子指纹发现延长寿命化合物的机器学习框架。
Aging Cell. 2025 Aug;24(8):e70116. doi: 10.1111/acel.70116. Epub 2025 May 26.
2
Machine Learning in Drug Development for Neurological Diseases: A Review of Blood Brain Barrier Permeability Prediction Models.用于神经疾病药物研发的机器学习:血脑屏障通透性预测模型综述
Mol Inform. 2025 Mar;44(3):e202400325. doi: 10.1002/minf.202400325.
3
Advances in Protein-Ligand Binding Affinity Prediction via Deep Learning: A Comprehensive Study of Datasets, Data Preprocessing Techniques, and Model Architectures.

本文引用的文献

1
MayaChemTools: An Open Source Package for Computational Drug Discovery.MayaChemTools:一个用于计算药物发现的开源软件包。
J Chem Inf Model. 2016 Dec 27;56(12):2292-2297. doi: 10.1021/acs.jcim.6b00505. Epub 2016 Nov 16.
2
Consensus Diversity Plots: a global diversity analysis of chemical libraries.共识多样性图:化学文库的全球多样性分析。
J Cheminform. 2016 Nov 10;8:63. doi: 10.1186/s13321-016-0176-9. eCollection 2016.
3
Chemoinformatic expedition of the chemical space of fungal products.真菌产物化学空间的化学信息学探索。
基于深度学习的蛋白质-配体结合亲和力预测方法进展:数据集、数据预处理技术和模型架构的综合研究。
Curr Drug Targets. 2024;25(15):1041-1065. doi: 10.2174/0113894501330963240905083020.
4
iSIM: instant similarity.iSIM:即时相似度。
Digit Discov. 2024 May 7;3(6):1160-1171. doi: 10.1039/d4dd00041b. eCollection 2024 Jun 12.
5
Chemical Multiverse and Diversity of Food Chemicals.化学多元宇宙与食品化学的多样性。
J Chem Inf Model. 2024 Feb 26;64(4):1229-1244. doi: 10.1021/acs.jcim.3c01617. Epub 2024 Feb 14.
6
Discovering Potential Compounds for Venous Disease Treatment through Virtual Screening and Network Pharmacology Approach.通过虚拟筛选和网络药理学方法发现静脉疾病治疗的潜在化合物。
Molecules. 2023 Dec 5;28(24):7937. doi: 10.3390/molecules28247937.
7
Predicting pathways for old and new metabolites through clustering.通过聚类预测新旧代谢物的途径。
J Theor Biol. 2024 Feb 7;578:111684. doi: 10.1016/j.jtbi.2023.111684. Epub 2023 Dec 3.
8
Novel Thiosemicarbazone Quantum Dots in the Treatment of Alzheimer's Disease Combining In Silico Models Using Fingerprints and Physicochemical Descriptors.新型硫代氨基脲量子点结合使用指纹图谱和物理化学描述符的计算机模拟模型治疗阿尔茨海默病
ACS Omega. 2023 Mar 17;8(12):11076-11099. doi: 10.1021/acsomega.2c07934. eCollection 2023 Mar 28.
9
A KNIME Workflow to Assist the Analogue Identification for Read-Across, Applied to Aromatase Activity.一种用于协助相似性分析中类比物识别的 KNIME 工作流程,应用于芳香酶活性。
Molecules. 2023 Feb 15;28(4):1832. doi: 10.3390/molecules28041832.
10
Multi-stage virtual screening of natural products against p38α mitogen-activated protein kinase: predictive modeling by machine learning, docking study and molecular dynamics simulation.天然产物针对p38α丝裂原活化蛋白激酶的多阶段虚拟筛选:通过机器学习、对接研究和分子动力学模拟进行预测建模
Heliyon. 2022 Sep 1;8(9):e10495. doi: 10.1016/j.heliyon.2022.e10495. eCollection 2022 Sep.
Future Med Chem. 2016 Aug;8(12):1399-412. doi: 10.4155/fmc-2016-0079. Epub 2016 Aug 3.
4
Iterative Shannon Entropy - a Methodology to Quantify the Information Content of Value Range Dependent Data Distributions. Application to Descriptor and Compound Selectivity Profiling.迭代香农熵——一种量化依赖值域的数据分布信息含量的方法。应用于描述符和化合物选择性分析。
Mol Inform. 2010 May 17;29(5):432-40. doi: 10.1002/minf.201000029. Epub 2010 May 10.
5
The chemical space project.化学空间计划。
Acc Chem Res. 2015 Mar 17;48(3):722-30. doi: 10.1021/ar500432k. Epub 2015 Feb 17.
6
IMMAN: free software for information theory-based chemometric analysis.IMMAN:用于基于信息论的化学计量学分析的免费软件。
Mol Divers. 2015 May;19(2):305-19. doi: 10.1007/s11030-014-9565-z. Epub 2015 Jan 26.
7
Chemoinformatic characterization of activity and selectivity switches of antiprotozoal compounds.抗原虫化合物活性和选择性开关的化学信息学特征。
Future Med Chem. 2014 Mar;6(3):281-94. doi: 10.4155/fmc.13.173. Epub 2013 Nov 27.
8
DrugBank 4.0: shedding new light on drug metabolism.DrugBank 4.0:揭示药物代谢的新视角。
Nucleic Acids Res. 2014 Jan;42(Database issue):D1091-7. doi: 10.1093/nar/gkt1068. Epub 2013 Nov 6.
9
Chemoinformatic analysis of GRAS (Generally Recognized as Safe) flavor chemicals and natural products.GRAS(一般认为安全)风味化学品和天然产物的化学生物信息学分析。
PLoS One. 2012;7(11):e50798. doi: 10.1371/journal.pone.0050798. Epub 2012 Nov 30.
10
Shannon entropy-based fingerprint similarity search strategy.基于香农熵的指纹相似性搜索策略。
J Chem Inf Model. 2009 Jul;49(7):1687-91. doi: 10.1021/ci900159f.