LINGO-DL：一种基于文本的分子相似性搜索方法。

LINGO-DL: a text-based approach for molecular similarity searching.

机构信息

Universite de Lille, Villeneuve d'Ascq cedex, France.

出版信息

J Comput Aided Mol Des. 2021 May;35(5):657-665. doi: 10.1007/s10822-021-00383-9. Epub 2021 Apr 2.

Abstract

The line notations of chemical structures are more compact than those of graphs and connection tables, so they can be useful for storing and transferring a large number of molecular structures. The simplified molecular input line system (SMILES) representation is the most extensively used, as it is much easier to utilise and comprehend than others, and it can be generated automatically from connection tables. A SMILES represents and encodes the molecule structure. It has been used by an existing method, LINGO, to calculate the molecular similarities and predict the structure-related properties. The LINGO method decomposes a canonical SMILES into a set of substrings of four characters referred to as LINGOs. The purpose of LINGO method is to measure the similarity between a pair of molecules by comparing the LINGOs that occur in each molecule. This paper aims to introduce an alternative version of the LINGO method using LINGOs of different lengths, called LINGO-DL. LINGO-DL is based on the fragmentation of canonical SMILES into substrings of three different lengths rather than one in LINGO method. Retrospective virtual screening experiments with MDDR, DUD, and MUV datasets show that the LINGO-DL outperforms the LINGO method, especially when the active molecules being sought have a high degree of structural heterogeneity.

摘要

化学结构的线式符号比图形和连接表更紧凑，因此它们可用于存储和传输大量分子结构。简化分子输入行系统 (SMILES) 表示法是最广泛使用的，因为它比其他表示法更容易使用和理解，并且可以从连接表自动生成。SMILES 表示并编码分子结构。它已被现有的 LINGO 方法用于计算分子相似性和预测与结构相关的性质。LINGO 方法将规范的 SMILES 分解为一组四个字符的子字符串，称为 LINGOs。LINGO 方法的目的是通过比较每个分子中出现的 LINGOs 来衡量一对分子之间的相似性。本文旨在介绍一种使用不同长度的 LINGOs 的 LINGO 方法的替代版本，称为 LINGO-DL。LINGO-DL 基于将规范的 SMILES 分割成三个不同长度的子字符串，而不是 LINGO 方法中的一个子字符串。对 MDDR、DUD 和 MUV 数据集的回顾性虚拟筛选实验表明，LINGO-DL 优于 LINGO 方法，特别是当所寻找的活性分子具有高度的结构异质性时。

相似文献

LINGO-DL: a text-based approach for molecular similarity searching.LINGO-DL：一种基于文本的分子相似性搜索方法。

J Comput Aided Mol Des. 2021 May;35(5):657-665. doi: 10.1007/s10822-021-00383-9. Epub 2021 Apr 2.

LINGO, an efficient holographic text based method to calculate biophysical properties and intermolecular similarities.LINGO，一种基于全息文本的高效方法，用于计算生物物理性质和分子间相似性。

J Chem Inf Model. 2005 Mar-Apr;45(2):386-93. doi: 10.1021/ci0496797.

Using inverted indices for accelerating LINGO calculations.利用倒排索引加速 LINGO 计算。

J Chem Inf Model. 2011 Mar 28;51(3):597-600. doi: 10.1021/ci100437e. Epub 2011 Feb 18.

Improved Deep Learning Based Method for Molecular Similarity Searching Using Stack of Deep Belief Networks.基于深度置信网络堆叠的改进深度学习分子相似性搜索方法。

Molecules. 2020 Dec 29;26(1):128. doi: 10.3390/molecules26010128.

Quantum probability ranking principle for ligand-based virtual screening.基于配体的虚拟筛选的量子概率排序原则。

J Comput Aided Mol Des. 2017 Apr;31(4):365-378. doi: 10.1007/s10822-016-0003-4. Epub 2017 Feb 20.

SIML: a fast SIMD algorithm for calculating LINGO chemical similarities on GPUs and CPUs.SIML：一种在 GPU 和 CPU 上计算 LINGO 化学相似度的快速 SIMD 算法。

J Chem Inf Model. 2010 Apr 26;50(4):560-4. doi: 10.1021/ci100011z.

Development of R-Group Fingerprints Based on the Local Landscape from an Attachment Point of a Molecular Structure.基于分子结构连接点处的局域景观开发 R 基团指纹。

J Chem Inf Model. 2019 Jun 24;59(6):2656-2663. doi: 10.1021/acs.jcim.9b00122. Epub 2019 May 6.

SABRE: ligand/structure-based virtual screening approach using consensus molecular-shape pattern recognition.SABRE：基于配体/结构的虚拟筛选方法，使用共识分子形状模式识别。

J Chem Inf Model. 2014 Jan 27;54(1):338-46. doi: 10.1021/ci4005496. Epub 2013 Dec 23.

De Novo Molecule Design by Translating from Reduced Graphs to SMILES.从头设计分子：从简化图到 SMILES 的转换。

J Chem Inf Model. 2019 Mar 25;59(3):1136-1146. doi: 10.1021/acs.jcim.8b00626. Epub 2018 Dec 21.

Iterative Screening Methods for Identification of Chemical Compounds with Specific Values of Various Properties.迭代筛选方法，用于鉴定具有各种性质特定值的化学化合物。

J Chem Inf Model. 2019 Jun 24;59(6):2626-2641. doi: 10.1021/acs.jcim.9b00093. Epub 2019 May 6.

本文引用的文献

A review of ligand-based virtual screening web tools and screening algorithms in large molecular databases in the age of big data.大数据时代基于配体的虚拟筛选网络工具和筛选算法在大型分子数据库中的研究进展。

Future Med Chem. 2018 Nov;10(22):2641-2658. doi: 10.4155/fmc-2018-0076. Epub 2018 Nov 30.

Computational methods in drug discovery.药物发现中的计算方法。

Pharmacol Rev. 2013 Dec 31;66(1):334-95. doi: 10.1124/pr.112.007336. Print 2014.

SMIfp (SMILES fingerprint) chemical space for virtual screening and visualization of large databases of organic molecules.SMIfp（SMILES 指纹）化学空间可用于大型有机分子数据库的虚拟筛选和可视化。

J Chem Inf Model. 2013 Aug 26;53(8):1979-89. doi: 10.1021/ci400206h. Epub 2013 Jul 30.

Ligand expansion in ligand-based virtual screening using relevance feedback.基于配体的虚拟筛选中的配体扩展使用相关性反馈。

J Comput Aided Mol Des. 2012 Mar;26(3):279-87. doi: 10.1007/s10822-012-9543-4. Epub 2012 Jan 17.

New fragment weighting scheme for the Bayesian inference network in ligand-based virtual screening.基于配体的虚拟筛选中贝叶斯推理网络的新片段加权方案。

J Chem Inf Model. 2011 Jan 24;51(1):25-32. doi: 10.1021/ci100232h. Epub 2010 Dec 14.

Ligand-based virtual screening using Bayesian networks.基于配体的贝叶斯网络虚拟筛选。

J Chem Inf Model. 2010 Jun 28;50(6):1012-20. doi: 10.1021/ci100090p.

Optimal assignment methods for ligand-based virtual screening.基于配体的虚拟筛选的最优分配方法。

J Cheminform. 2009 Aug 25;1:14. doi: 10.1186/1758-2946-1-14.

Maximum unbiased validation (MUV) data sets for virtual screening based on PubChem bioactivity data.基于PubChem生物活性数据的虚拟筛选最大无偏验证（MUV）数据集。

J Chem Inf Model. 2009 Feb;49(2):169-84. doi: 10.1021/ci8002649.

J Chem Inf Model. 2009 Jan;49(1):108-19. doi: 10.1021/ci800249s.

SYBYL line notation (SLN): a single notation to represent chemical structures, queries, reactions, and virtual libraries.SYBYL 线式表示法（SLN）：一种用于表示化学结构、查询、反应和虚拟库的单一表示法。

J Chem Inf Model. 2008 Dec;48(12):2294-307. doi: 10.1021/ci7004687.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

LINGO-DL：一种基于文本的分子相似性搜索方法。

LINGO-DL: a text-based approach for molecular similarity searching.

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献