Suppr超能文献

通过分子语言处理生成的 6700 万种类似天然产物的化合物数据库。

67 million natural product-like compound database generated via molecular language processing.

机构信息

Institute of Sustainability for Chemicals, Energy and Environment (ISCE2), Agency for Science, Technology and Research (A*STAR), 8 Biomedical Grove, #07-01 Neuros Building, Singapore, 138665, Republic of Singapore.

Hwa Chong Institution, 661 Bukit Timah Road, Singapore, 269734, Republic of Singapore.

出版信息

Sci Data. 2023 May 19;10(1):296. doi: 10.1038/s41597-023-02207-x.

Abstract

Natural products are a rich resource of bioactive compounds for valuable applications across multiple fields such as food, agriculture, and medicine. For natural product discovery, high throughput in silico screening offers a cost-effective alternative to traditional resource-heavy assay-guided exploration of structurally novel chemical space. In this data descriptor, we report a characterized database of 67,064,204 natural product-like molecules generated using a recurrent neural network trained on known natural products, demonstrating a significant 165-fold expansion in library size over the approximately 400,000 known natural products. This study highlights the potential of using deep generative models to explore novel natural product chemical space for high throughput in silico discovery.

摘要

天然产物是生物活性化合物的丰富资源,可应用于食品、农业和医学等多个领域。对于天然产物的发现,高通量的计算筛选为传统的资源密集型、基于结构的新型化学空间的检测指导探索提供了一种具有成本效益的替代方法。在本数据描述中,我们报告了一个经过特征描述的数据库,其中包含了 67064204 个类似天然产物的分子,这些分子是使用基于已知天然产物的递归神经网络训练得到的,与大约 40 万个已知天然产物相比,库的大小显著扩大了 165 倍。这项研究强调了使用深度生成模型来探索新型天然产物化学空间以进行高通量计算发现的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1119/10199072/75b9a539eaae/41597_2023_2207_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验