Suppr超能文献

SwinOCSR:使用Swin Transformer进行端到端光学化学结构识别

SwinOCSR: end-to-end optical chemical structure recognition using a Swin Transformer.

作者信息

Xu Zhanpeng, Li Jianhua, Yang Zhaopeng, Li Shiliang, Li Honglin

机构信息

School of Information Science and Engineering, East China University of Science and Technology, 130 Mei Long Road, Shanghai, 200237, China.

State Key Laboratory of Bioreactor Engineering, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China.

出版信息

J Cheminform. 2022 Jul 1;14(1):41. doi: 10.1186/s13321-022-00624-5.

Abstract

Optical chemical structure recognition from scientific publications is essential for rediscovering a chemical structure. It is an extremely challenging problem, and current rule-based and deep-learning methods cannot achieve satisfactory recognition rates. Herein, we propose SwinOCSR, an end-to-end model based on a Swin Transformer. This model uses the Swin Transformer as the backbone to extract image features and introduces Transformer models to convert chemical information from publications into DeepSMILES. A novel chemical structure dataset was constructed to train and verify our method. Our proposed Swin Transformer-based model was extensively tested against the backbone of existing publicly available deep learning methods. The experimental results show that our model significantly outperforms the compared methods, demonstrating the model's effectiveness. Moreover, we used a focal loss to address the token imbalance problem in the text representation of the chemical structure diagram, and our model achieved an accuracy of 98.58%.

摘要

从科学出版物中进行光学化学结构识别对于重新发现化学结构至关重要。这是一个极具挑战性的问题,当前基于规则和深度学习的方法都无法实现令人满意的识别率。在此,我们提出了SwinOCSR,一种基于Swin Transformer的端到端模型。该模型以Swin Transformer作为主干来提取图像特征,并引入Transformer模型将出版物中的化学信息转换为DeepSMILES。构建了一个新颖的化学结构数据集来训练和验证我们的方法。我们提出的基于Swin Transformer的模型针对现有公开可用深度学习方法的主干进行了广泛测试。实验结果表明,我们的模型显著优于比较方法,证明了该模型的有效性。此外,我们使用焦点损失来解决化学结构图文本表示中的令牌不平衡问题,我们的模型实现了98.58%的准确率。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1fbf/9248127/84b3b80599c0/13321_2022_624_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验