Suppr超能文献

基于自动编码器深度学习的分子相似性搜索特征降维。

Feature Reduction for Molecular Similarity Searching Based on Autoencoder Deep Learning.

机构信息

School of Computing, Universiti Teknologi Malaysia, Johor Bahru 81310, Malaysia.

DAAI Research Group, Department of Computing and Data Science, School of Computing and Digital Technology, Birmingham City University, Birmingham B4 7XG, UK.

出版信息

Biomolecules. 2022 Mar 27;12(4):508. doi: 10.3390/biom12040508.

Abstract

The concept of molecular similarity has been commonly used in rational drug design, where structurally similar molecules are examined in molecular databases to retrieve functionally similar molecules. The most used conventional similarity methods used two-dimensional (2D) fingerprints to evaluate the similarity of molecules towards a target query. However, these descriptors include redundant and irrelevant features that might impact the performance of similarity searching methods. Thus, this study proposed a new approach for identifying the important features of molecules in chemical datasets based on the representation of the molecular features using Autoencoder (AE), with the aim of removing irrelevant and redundant features. The proposed approach experimented using the MDL Data Drug Report standard dataset (MDDR). Based on experimental findings, the proposed approach performed better than several existing benchmark similarity methods such as Tanimoto Similarity Method (TAN), Adapted Similarity Measure of Text Processing (ASMTP), and Quantum-Based Similarity Method (SQB). The results demonstrated that the performance achieved by the proposed approach has proven to be superior, particularly with the use of structurally heterogeneous datasets, where it yielded improved results compared to other previously used methods with the similar goal of improving molecular similarity searching.

摘要

分子相似性的概念在合理药物设计中被广泛应用,在分子数据库中检查结构相似的分子,以检索功能相似的分子。最常用的传统相似性方法使用二维(2D)指纹来评估分子对目标查询的相似性。然而,这些描述符包含冗余和不相关的特征,可能会影响相似性搜索方法的性能。因此,本研究提出了一种基于自动编码器(AE)表示分子特征的新方法,用于识别化学数据集分子的重要特征,旨在去除不相关和冗余的特征。该方法在 MDL Data Drug Report 标准数据集(MDDR)上进行了实验。基于实验结果,与 Tanimoto 相似性方法(TAN)、文本处理自适应相似性度量(ASMTP)和基于量子的相似性方法(SQB)等几种现有的基准相似性方法相比,所提出的方法表现更好。结果表明,所提出的方法的性能表现优越,特别是在使用结构异构数据集时,与其他具有相似目标的先前使用的方法相比,它取得了更好的结果,即改善分子相似性搜索。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7f82/9029813/1bf0b5f97a8d/biomolecules-12-00508-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验