Suppr超能文献

使用机器翻译工具时与化学信息和数学符号相关的问题的简易解决方案。

Facile Solutions to the Problems Associated with Chemical Information and Mathematical Symbolism While Using Machine Translation Tools.

机构信息

Department of Chemistry & Biochemistry, University of Texas at Arlington, Arlington, Texas 76019, United States.

Department of Chemistry, School of Sciences & Engineering, The American University in Cairo, New Cairo 11835, Egypt.

出版信息

J Chem Inf Model. 2020 Jul 27;60(7):3423-3430. doi: 10.1021/acs.jcim.0c00274. Epub 2020 Jun 25.

Abstract

Advances in computer-aided translation technology have made tremendous progress in accuracy in the past few years. Chemical Abstracts Service of the American Chemical Society summarizes scientific works from more than 50 languages and allows the users to search papers in nine selected languages. Currently, only the abstracts are rendered into English by human experts or by machine translation because full text translation of millions of articles is beyond the human capacity today. An English translation of a research paper, scientific book, or patent is often required for research, data mining, and for historical purposes from various foreign languages. Many fundamental papers in chemistry, quantum chemistry, physics, and mathematics contain a significant number of chemical or mathematical equations. One of the major known problems in machine translation of such symbolically dense texts is incorrect or meaningless output. This article describes how to optimize the existing machine translation tools to read foreign language papers embedded with chemical/mathematical equations. German and French languages have been selected for illustrative purposes for English translation. Direct upload of text with extensive symbolism is possible with certain services, but this also occasionally produces erroneous rendition into English. A facile solution to the associated problems with embedded equations and mathematical formulas is replacing the equations and notations with "dummy" variables. The placeholder or dummy symbols can be removed after translation, and the original equations are substituted again. This approach, which can be automated in future, relies on the idea that chemical formulas and mathematical notations are universal. Following the guidelines in the article, excellent translations can be produced from a text having interspersed equations and chemical symbols.

摘要

在过去的几年中,计算机辅助翻译技术在准确性方面取得了巨大的进步。美国化学学会的化学文摘服务总结了来自 50 多种语言的科学著作,并允许用户用 9 种选定的语言搜索论文。目前,由于人类目前的能力还无法翻译数百万篇文章的全文,只有摘要由人类专家或机器翻译生成英文。研究、数据挖掘以及出于历史目的,经常需要将研究论文、科学书籍或专利从各种外语翻译成英文。化学、量子化学、物理和数学方面的许多基础论文都包含大量的化学或数学方程式。在机器翻译这种符号密集型文本时,一个已知的主要问题是输出不正确或无意义。本文介绍了如何优化现有的机器翻译工具,以阅读嵌入化学/数学方程式的外语文献。本文选择了德语和法语来说明英语翻译。某些服务可以直接上传带有大量符号的文本,但这也偶尔会导致英文翻译错误。解决嵌入方程式和数学公式相关问题的一种简单方法是用“虚拟”变量替换方程式和符号。翻译后可以删除占位符或虚拟符号,并再次替换原始方程式。这种方法在未来可以实现自动化,其依赖的理念是化学公式和数学符号是通用的。按照本文的指导方针,可以从包含方程式和化学符号的文本中生成出色的译文。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验