Chen Yufan, Leung Ching Ting, Huang Yong, Sun Jianwei, Chen Hao, Gao Hanyu
Department of Chemical and Biological Engineering, Hong Kong University of Science and Technology, Hong Kong, SAR, China.
Department of Chemistry, Hong Kong University of Science and Technology, Hong Kong, SAR, China.
J Cheminform. 2024 Dec 18;16(1):141. doi: 10.1186/s13321-024-00926-w.
In the field of chemical structure recognition, the task of converting molecular images into machine-readable data formats such as SMILES string stands as a significant challenge, primarily due to the varied drawing styles and conventions prevalent in chemical literature. To bridge this gap, we proposed MolNexTR, a novel image-to-graph deep learning model that collaborates to fuse the strengths of ConvNext, a powerful Convolutional Neural Network variant, and Vision-TRansformer. This integration facilitates a more detailed extraction of both local and global features from molecular images. MolNexTR can predict atoms and bonds simultaneously and understand their layout rules. It also excels at flexibly integrating symbolic chemistry principles to discern chirality and decipher abbreviated structures. We further incorporate a series of advanced algorithms, including an improved data augmentation module, an image contamination module, and a post-processing module for getting the final SMILES output. These modules cooperate to enhance the model's robustness to diverse styles of molecular images found in real literature. In our test sets, MolNexTR has demonstrated superior performance, achieving an accuracy rate of 81-97%, marking a significant advancement in the domain of molecular structure recognition.Scientific contributionMolNexTR is a novel image-to-graph model that incorporates a unique dual-stream encoder to extract complex molecular image features, and combines chemical rules to predict atoms and bonds while understanding atom and bond layout rules. In addition, it employs a series of novel augmentation algorithms to significantly enhance the robustness and performance of the model.
在化学结构识别领域,将分子图像转换为机器可读的数据格式(如SMILES字符串)的任务是一项重大挑战,主要原因在于化学文献中普遍存在的多样绘图风格和惯例。为了弥合这一差距,我们提出了MolNexTR,这是一种新颖的图像到图形的深度学习模型,它融合了强大的卷积神经网络变体ConvNext和视觉Transformer的优势。这种整合有助于从分子图像中更详细地提取局部和全局特征。MolNexTR可以同时预测原子和键,并理解它们的布局规则。它还擅长灵活整合符号化学原理以辨别手性并解读缩写结构。我们进一步纳入了一系列先进算法,包括改进的数据增强模块、图像污染模块以及用于获得最终SMILES输出的后处理模块。这些模块协同工作,以增强模型对真实文献中发现的各种分子图像样式的鲁棒性。在我们的测试集中,MolNexTR表现出卓越的性能,准确率达到81 - 97%,标志着分子结构识别领域的重大进步。
科学贡献
MolNexTR是一种新颖的图像到图形模型,它采用独特的双流编码器来提取复杂的分子图像特征,并结合化学规则来预测原子和键,同时理解原子和键的布局规则。此外,它采用了一系列新颖的增强算法,显著提高了模型的鲁棒性和性能。