Sayle Roger
OpenEye Scientific Software, Santa Fe, New Mexico 87508, USA.
J Chem Inf Model. 2009 Mar;49(3):519-30. doi: 10.1021/ci800243w.
Chemical compound names remain the primary method for conveying molecular structures between chemists and researchers. In research articles, patents, chemical catalogues, government legislation, and textbooks, the use of IUPAC and traditional compound names is universal, despite efforts to introduce more machine-friendly representations such as identifiers and line notations. Fortunately, advances in computing power now allow chemical names to be parsed and generated (read and written) with almost the same ease as conventional connection tables. A significant complication, however, is that although the vast majority of chemistry uses English nomenclature, a significant fraction is in other languages. This complicates the task of filing and analyzing chemical patents, purchasing from compound vendors, and text mining research articles or Web pages. We describe some issues with manipulating chemical names in various languages, including British, American, German, Japanese, Chinese, Spanish, Swedish, Polish, and Hungarian, and describe the current state-of-the-art in software tools to simplify the process.
化合物名称仍然是化学家与研究人员之间传达分子结构的主要方式。在研究文章、专利、化学产品目录、政府法规及教科书中,尽管人们努力引入更多便于机器处理的表示方法,如标识符和线性表示法,但国际纯粹与应用化学联合会(IUPAC)命名法及传统化合物名称的使用依然十分普遍。幸运的是,如今计算能力的提升使得化学名称的解析与生成(读取与书写)几乎与传统连接表一样轻松。然而,一个显著的复杂情况是,尽管绝大多数化学领域使用英文命名法,但仍有相当一部分是其他语言的。这使得化学专利的归档与分析、从化合物供应商处采购以及对研究文章或网页进行文本挖掘等任务变得复杂。我们阐述了处理包括英式、美式、德文、日文、中文、西班牙文、瑞典文、波兰文和匈牙利文等各种语言化学名称时的一些问题,并介绍了当前用于简化该过程的软件工具的技术水平。