Xu Zhanpeng, Li Jianhua, Yang Zhaopeng, Li Shiliang, Li Honglin
School of Information Science and Engineering, East China University of Science and Technology, 130 Mei Long Road, Shanghai, 200237, China.
State Key Laboratory of Bioreactor Engineering, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China.
J Cheminform. 2022 Jul 1;14(1):41. doi: 10.1186/s13321-022-00624-5.
Optical chemical structure recognition from scientific publications is essential for rediscovering a chemical structure. It is an extremely challenging problem, and current rule-based and deep-learning methods cannot achieve satisfactory recognition rates. Herein, we propose SwinOCSR, an end-to-end model based on a Swin Transformer. This model uses the Swin Transformer as the backbone to extract image features and introduces Transformer models to convert chemical information from publications into DeepSMILES. A novel chemical structure dataset was constructed to train and verify our method. Our proposed Swin Transformer-based model was extensively tested against the backbone of existing publicly available deep learning methods. The experimental results show that our model significantly outperforms the compared methods, demonstrating the model's effectiveness. Moreover, we used a focal loss to address the token imbalance problem in the text representation of the chemical structure diagram, and our model achieved an accuracy of 98.58%.
从科学出版物中进行光学化学结构识别对于重新发现化学结构至关重要。这是一个极具挑战性的问题,当前基于规则和深度学习的方法都无法实现令人满意的识别率。在此,我们提出了SwinOCSR,一种基于Swin Transformer的端到端模型。该模型以Swin Transformer作为主干来提取图像特征,并引入Transformer模型将出版物中的化学信息转换为DeepSMILES。构建了一个新颖的化学结构数据集来训练和验证我们的方法。我们提出的基于Swin Transformer的模型针对现有公开可用深度学习方法的主干进行了广泛测试。实验结果表明,我们的模型显著优于比较方法,证明了该模型的有效性。此外,我们使用焦点损失来解决化学结构图文本表示中的令牌不平衡问题,我们的模型实现了98.58%的准确率。