Li Da-Zhou, Xu Xin, Pan Jia-Heng, Gao Wei, Zhang Shi-Rui
College of Computer Science and Technology, Shenyang University of Chemical Technology, Shenyang 110000, China.
J Chem Inf Model. 2024 May 13;64(9):3640-3649. doi: 10.1021/acs.jcim.3c02082. Epub 2024 Feb 15.
The accurate identification and analysis of chemical structures in molecular images are prerequisites of artificial intelligence for drug discovery. It is important to efficiently and automatically convert molecular images into machine-readable representations. Therefore, in this paper, we propose an automated molecular optical image recognition model based on deep learning, called Image2InChI. Additionally, the proposed Image2InChI introduces a novel feature fusion network with attention to integrate image patch and InChI prediction. The improved SwinTransformer as an encoder and the Transformer Decoder as a decoder with patch embedding are applied to predict the image features for the corresponding InChI. The experimental results showed that the Image2InChI model achieves an accuracy of InChI (InChI acc) of 99.8%, a Morgan FP of 94.1%, an accuracy of maximum common structures (MCS acc) of 94.8%, and an accuracy of longest common subsequence (LCS acc) of 96.2%. The experiments demonstrated that the proposed Image2InChI model improves the accuracy and efficiency of molecular image recognition and provided a valuable reference about optical chemical structure recognition for InChI.
分子图像中化学结构的准确识别与分析是药物发现人工智能的先决条件。将分子图像高效且自动地转换为机器可读表示非常重要。因此,在本文中,我们提出了一种基于深度学习的自动分子光学图像识别模型,称为Image2InChI。此外,所提出的Image2InChI引入了一种带有注意力机制的新型特征融合网络,以整合图像块和InChI预测。改进的SwinTransformer作为编码器,Transformer解码器作为带有补丁嵌入的解码器,用于预测相应InChI的图像特征。实验结果表明,Image2InChI模型的InChI准确率(InChI acc)达到99.8%,摩根指纹(Morgan FP)达到94.1%,最大公共结构准确率(MCS acc)达到94.8%,最长公共子序列准确率(LCS acc)达到96.2%。实验证明,所提出的Image2InChI模型提高了分子图像识别的准确率和效率,并为InChI的光学化学结构识别提供了有价值的参考。