Suppr超能文献

分子指纹图谱在探索天然产物化学空间方面的有效性。

Effectiveness of molecular fingerprints for exploring the chemical space of natural products.

作者信息

Boldini Davide, Ballabio Davide, Consonni Viviana, Todeschini Roberto, Grisoni Francesca, Sieber Stephan A

机构信息

TUM School of Natural Sciences, Department of Bioscience, Technical University of Munich, Center for Functional Protein Assemblies (CPA), 85748, Garching bei München, Germany.

Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, P.zza Della Scienza, 1, 20126, Milan, Italy.

出版信息

J Cheminform. 2024 Mar 25;16(1):35. doi: 10.1186/s13321-024-00830-3.

Abstract

Natural products are a diverse class of compounds with promising biological properties, such as high potency and excellent selectivity. However, they have different structural motifs than typical drug-like compounds, e.g., a wider range of molecular weight, multiple stereocenters and higher fraction of sp-hybridized carbons. This makes the encoding of natural products via molecular fingerprints difficult, thus restricting their use in cheminformatics studies. To tackle this issue, we explored over 30 years of research to systematically evaluate which molecular fingerprint provides the best performance on the natural product chemical space. We considered 20 molecular fingerprints from four different sources, which we then benchmarked on over 100,000 unique natural products from the COCONUT (COlleCtion of Open Natural prodUcTs) and CMNPD (Comprehensive Marine Natural Products Database) databases. Our analysis focused on the correlation between different fingerprints and their classification performance on 12 bioactivity prediction datasets. Our results show that different encodings can provide fundamentally different views of the natural product chemical space, leading to substantial differences in pairwise similarity and performance. While Extended Connectivity Fingerprints are the de-facto option to encoding drug-like compounds, other fingerprints resulted to match or outperform them for bioactivity prediction of natural products. These results highlight the need to evaluate multiple fingerprinting algorithms for optimal performance and suggest new areas of research. Finally, we provide an open-source Python package for computing all molecular fingerprints considered in the study, as well as data and scripts necessary to reproduce the results, at https://github.com/dahvida/NP_Fingerprints .

摘要

天然产物是一类具有多样化学结构的化合物,具有如高效能和优异选择性等有前景的生物学特性。然而,它们与典型的类药物化合物具有不同的结构基序,例如分子量范围更广、多个立体中心以及更高比例的sp杂化碳。这使得通过分子指纹对天然产物进行编码变得困难,从而限制了它们在化学信息学研究中的应用。为了解决这个问题,我们对30多年的研究进行了系统评估,以确定哪种分子指纹在天然产物化学空间中表现最佳。我们考虑了来自四个不同来源的20种分子指纹,然后在来自COCONUT(开放天然产物集合)和CMNPD(综合海洋天然产物数据库)数据库的超过100,000种独特天然产物上进行了基准测试。我们的分析重点关注不同指纹之间的相关性及其在12个生物活性预测数据集上的分类性能。我们的结果表明,不同的编码方式可以提供对天然产物化学空间截然不同的视图,导致成对相似性和性能存在显著差异。虽然扩展连接指纹是编码类药物化合物的实际选择,但其他指纹在天然产物的生物活性预测方面与之相当或表现更优。这些结果凸显了评估多种指纹算法以实现最佳性能的必要性,并提出了新的研究领域。最后,我们在https://github.com/dahvida/NP_Fingerprints上提供了一个开源Python包,用于计算本研究中考虑的所有分子指纹,以及重现结果所需的数据和脚本。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c135/10964529/46f6f8f780e7/13321_2024_830_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验