Colmenarejo Gonzalo
Biostatistics and Bioinformatics Unit, IMDEA Food, E28049 Madrid, Spain.
J Chem Inf Model. 2025 Feb 10;65(3):1061-1066. doi: 10.1021/acs.jcim.4c02268. Epub 2025 Jan 28.
Functional groups are widely used in organic chemistry, because they provide a rationale to analyze physicochemical and reactivity properties. In medicinal chemistry, they are the basis for analyzing ligand-biomacromolecule interactions. Ertl's algorithm is an approach to extract functional groups in arbitrary organic molecules that does not depend on predefined libraries of functional groups. However, there is a lack of a complete and accurate implementation of Ertl's algorithm in the widely used RDKit cheminformatic toolkit. In this paper, a new RDKit/Python implementation of the algorithm is described, that is both accurate and complete. For a RDKit molecule, it provides (i) a PNG binary string with an image of the molecule with color-highlighted functional groups; (ii) a list of sets of atom indices (idx), each set corresponding to a functional group; (iii) a list of pseudo-SMILES canonicalized strings for the full functional groups; and (iv) a list of RDKit labeled mol objects, one for each full functional group. The code is freely available in https://github.com/bbu-imdea/efgs and is part of the RDKit Contrib directory (https://github.com/rdkit/rdkit/tree/master/Contrib/efgs).
官能团在有机化学中被广泛使用,因为它们为分析物理化学性质和反应活性提供了理论依据。在药物化学中,它们是分析配体与生物大分子相互作用的基础。厄特尔算法是一种用于提取任意有机分子中官能团的方法,该方法不依赖于预定义的官能团库。然而,在广泛使用的RDKit化学信息学工具包中,缺乏对厄特尔算法完整且准确的实现。本文描述了该算法在RDKit/Python中的一种新实现,它既准确又完整。对于一个RDKit分子,它提供:(i) 一个PNG二进制字符串,其中包含带有颜色突出显示官能团的分子图像;(ii) 原子索引集 (idx) 的列表,每个集合对应一个官能团;(iii) 完整官能团的伪SMILES规范化字符串列表;以及(iv) RDKit标记的mol对象列表,每个完整官能团对应一个。该代码可在https://github.com/bbu-imdea/efgs上免费获取,并且是RDKit Contrib目录(https://github.com/rdkit/rdkit/tree/master/Contrib/efgs)的一部分。