Jindalertudomdee Jira, Hayashida Morihiro, Akutsu Tatsuya
Laboratory of Mathematical Bioinformatics, Bioinformatics Center, Institute for Chemical Research, Kyoto University , Kyoto, Japan .
J Comput Biol. 2016 Aug;23(8):625-40. doi: 10.1089/cmb.2016.0056. Epub 2016 Jun 27.
Enumeration of chemical structures satisfying given conditions is an important step in the discovery of new compounds and drugs, as well as the elucidation of the structure. One of the most frequently used conditions in the enumeration is the number of chemical elements that corresponds to the chemical formula. In this work, we propose a novel efficient enumeration algorithm, BfsStructEnum, which allows users to define desired cyclic structures and enumerates all nonredundant chemical compounds containing only defined structures as cyclic structures from a given chemical formula. To evaluate the performance, we confirm the number of enumerated structures of BfsStructEnum and MOLGEN 5.0, the latest version of a general-purpose structure generator. We also compare the computation time of BfsStructEnum with that of MOLGEN 5.0. The findings show that, given the same number of enumerated structures as MOLGEN 5.0, BfsStructEnum is significantly faster. By compressing a cyclic structure into a single node and representing chemical compounds by tree structures instead of normal graphs, the enumeration can be executed more efficiently.
枚举满足给定条件的化学结构是发现新化合物和药物以及阐明结构的重要一步。枚举中最常用的条件之一是与化学式对应的化学元素数量。在这项工作中,我们提出了一种新颖的高效枚举算法BfsStructEnum,它允许用户定义所需的环状结构,并从给定的化学式中枚举所有仅包含定义结构作为环状结构的非冗余化合物。为了评估性能,我们确定了BfsStructEnum和通用结构生成器最新版本MOLGEN 5.0的枚举结构数量。我们还将BfsStructEnum的计算时间与MOLGEN 5.0的计算时间进行了比较。结果表明,在枚举结构数量与MOLGEN 5.0相同的情况下,BfsStructEnum的速度要快得多。通过将环状结构压缩为单个节点并用树结构而非普通图来表示化合物,可以更高效地执行枚举。