Discovery Chemistry, Genentech Incorporated , 1 DNA Way, South San Francisco, California 94080, USA.
J Chem Inf Model. 2012 Feb 27;52(2):285-92. doi: 10.1021/ci200330x. Epub 2011 Dec 28.
Automated registration of compounds from external sources is necessitated by the numerous compound acquisitions from vendors and by the increasing number of collaborations with external partners. A prerequisite for automating compound registration is a robust module for determining the structural novelty of the input structures. Any such tool needs to be able to take uncertainty about stereochemistry into account and to identify tautomeric forms of the same compound. It also needs to validate structures for potential mistakes in connectivity and stereochemistry. Genentech has implemented a Structure Normalization Module based on toolkits offered by OpenEye Scientific Software. The module is incorporated in a graphical application for single compound registration and in scripts for bulk registration. It is also used for checking compounds submitted by our collaborators via partner-specific Internet sites. The Genentech Structure Normalization Module employs the widely used V2000 molfile format to accommodate structures received from a wide variety of sources. To determine how much information is known about the stereochemistry of each compound, the module requires a separate stereochemical assignment. A structural uniqueness check is performed by comparing the canonical SMILES of a standard tautomer. This paper offers a discussion of the steps taken to validate the chemical structure and generate the canonical SMILES of the standard tautomer. It also describes the integration of the validation module in compound registration pathways.
由于需要从供应商处购买大量化合物,并且与外部合作伙伴的合作数量不断增加,因此有必要从外部来源自动注册化合物。自动化化合物注册的前提是具有强大的模块来确定输入结构的结构新颖性。任何此类工具都需要能够考虑立体化学的不确定性,并识别同一化合物的互变异构形式。它还需要验证结构是否存在连接和立体化学方面的潜在错误。罗氏采用了基于 OpenEye Scientific Software 提供的工具包的结构归一化模块。该模块已整合到用于单个化合物注册的图形应用程序和用于批量注册的脚本中。它还用于检查通过特定于合作伙伴的网站提交给我们的合作伙伴的化合物。罗氏的结构归一化模块采用广泛使用的 V2000 molfile 格式来容纳来自各种来源的结构。为了确定每个化合物的立体化学信息的了解程度,模块需要单独的立体化学分配。通过比较标准互变异构体的规范 SMILES 来执行结构独特性检查。本文讨论了验证化学结构和生成标准互变异构体的规范 SMILES 所采取的步骤。它还描述了验证模块在化合物注册途径中的集成。