Zhang Dehai, Zhao Di, Wang Zhengwu, Li Junhui, Li Jin
The Key Laboratory of Software Engineering of Yunnan Province, School of Software, Yunnan University Kunming China
RSC Adv. 2024 Jun 6;14(26):18182-18191. doi: 10.1039/d4ra02442g.
In the growing body of scientific literature, the structure and information of drugs are usually represented in two-dimensional vector graphics. Drug compound structures in vector graphics form are difficult to recognize and utilize by computers. Although the current OCSR paradigm has shown good performance, most existing work treats it as a single isolated whole. This paper proposes a multi-stage cognitive neural network model that predicts molecular vector graphics more finely. Based on cognitive methods, we construct a model for fine-grained perceptual representation of molecular images from bottom to top, and in stages, the primary representation of atoms and bonds is potential discrete label sequence (atom type, bond type, functional group, ). The second stage represents the molecular graph according to the label sequence, and the final stage evolves in an extensible manner from the molecular graph to a machine-readable sequence. Experimental results show that MMSSC-Net outperforms current advanced methods on multiple public datasets. It achieved an accuracy rate of 75-94% on cognitive recognition at different resolutions. MMSSC-Net uses a sequence cognitive method to make it more reliable in interpretability and transferability, and provides new ideas for drug information discovery and exploring the unknown chemical space.
在不断增长的科学文献中,药物的结构和信息通常以二维矢量图形表示。矢量图形形式的药物化合物结构难以被计算机识别和利用。尽管当前的OCSR范式已表现出良好性能,但大多数现有工作将其视为一个单一的孤立整体。本文提出了一种多阶段认知神经网络模型,能更精细地预测分子矢量图形。基于认知方法,我们构建了一个从下到上对分子图像进行细粒度感知表示的模型,分阶段进行,原子和键的初级表示是潜在的离散标签序列(原子类型、键类型、官能团等)。第二阶段根据标签序列表示分子图,最后阶段以可扩展的方式从分子图演变为机器可读序列。实验结果表明,MMSSC-Net在多个公共数据集上优于当前先进方法。在不同分辨率下的认知识别中,它实现了75%-94%的准确率。MMSSC-Net采用序列认知方法,使其在可解释性和可迁移性方面更可靠,并为药物信息发现和探索未知化学空间提供了新思路。