Suppr超能文献

基于蒽醌和查尔酮衍生物的虚拟组合文库的比较分析。化学生信学“概念验证”研究。

Comparative analysis of an anthraquinone and chalcone derivatives-based virtual combinatorial library. A cheminformatics "proof-of-concept" study.

机构信息

PAP Rashidah Sa'adatul Bolkiah Institute of Health Sciences, Universiti Brunei Darussalam, Gadong, Brunei Darussalam.

Institute of Applied Data Analytics, Universiti Brunei Darussalam, Gadong, Brunei Darussalam.

出版信息

J Mol Graph Model. 2022 Dec;117:108307. doi: 10.1016/j.jmgm.2022.108307. Epub 2022 Aug 15.

Abstract

A Laplacian scoring algorithm for gene selection and the Gini coefficient to identify the genes whose expression varied least across a large set of samples were the state-of-the-art methods used here. These methods have not been trialed for their feasibility in cheminformatics. This was a maiden attempt to investigate a complete comparative analysis of an anthraquinone and chalcone derivatives-based virtual combinatorial library. This computational "proof-of-concept" study illustrated the combinatorial approach used to explain how the structure of the selected natural products (NPs) undergoes molecular diversity analysis. A virtual combinatorial library (1.6 M) based on 20 anthraquinones and 24 chalcones was enumerated. The resulting compounds were optimized to the near drug-likeness properties, and the physicochemical descriptors were calculated for all datasets including FDA, Non-FDA, and NPs from ZINC 15. UMAP and PCA were applied to compare and represent the chemical space coverage of each dataset. Subsequently, the Laplacian score and Gini coefficient were applied to delineate feature selection and selectivity among properties, respectively. Finally, we demonstrated the diversity between the datasets by employing Murcko's and the central scaffolds systems, calculating three fingerprint descriptors and analyzing their diversity by PCA and SOM. The optimized enumeration resulted in 1,610,268 compounds with NP-Likeness, and synthetic feasibility mean scores close to FDA, Non-FDA, and NPs datasets. The overlap between the chemical space of the 1.6 M database was more prominent than with the NPs dataset. A Laplacian score prioritized NP-likeness and hydrogen bond acceptor properties (1.0 and 0.923), respectively, while the Gini coefficient showed that all properties have selective effects on datasets (0.81-0.93). Scaffold and fingerprint diversity indicated that the descending order for the tested datasets was FDA, Non-FDA, NPs and 1.6 M. Virtual combinatorial libraries based on NPs can be considered as a source of the combinatorial compound with NP-likeness properties. Furthermore, measuring molecular diversity is supposed to be performed by different methods to allow for comparison and better judgment.

摘要

用于基因选择的拉普拉斯评分算法和基尼系数被用来识别在大量样本中表达变化最小的基因。这些方法尚未在化学信息学中进行可行性试验。这是首次尝试对蒽醌和查尔酮衍生物的虚拟组合文库进行全面的比较分析。这项计算“概念验证”研究说明了组合方法的应用,解释了所选天然产物 (NPs) 的结构如何进行分子多样性分析。基于 20 种蒽醌和 24 种查尔酮的虚拟组合库(160 万)进行了枚举。对所得化合物进行了接近药物相似性特性的优化,并计算了所有数据集(包括 FDA、非 FDA 和来自 ZINC 15 的 NPs)的物理化学描述符。UMAP 和 PCA 用于比较和表示每个数据集的化学空间覆盖范围。随后,拉普拉斯评分和基尼系数分别用于描绘特征选择和特性之间的选择性。最后,我们通过使用 Murcko 系统和中央支架系统来演示数据集之间的多样性,计算三个指纹描述符,并通过 PCA 和 SOM 分析它们的多样性。优化后的枚举结果得到了具有 NP 相似性的 1610268 种化合物,以及接近 FDA、非 FDA 和 NPs 数据集的合成可行性平均得分。1600 万数据库的化学空间重叠比与 NPs 数据集更为明显。拉普拉斯评分分别优先考虑 NP 相似性和氢键接受体性质(1.0 和 0.923),而基尼系数则表明所有性质对数据集都有选择性影响(0.81-0.93)。支架和指纹多样性表明,经过测试的数据集的降序顺序为 FDA、非 FDA、NPs 和 1600 万。基于 NPs 的虚拟组合库可以被视为具有 NP 相似性特性的组合化合物的来源。此外,应该通过不同的方法来测量分子多样性,以允许进行比较和更好的判断。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验