Degtyarenko Kirill, Ennis Marcus, Garavelli John S
European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.
In Silico Biol. 2007;7(2 Suppl):S45-56.
A structural diagram, in the form of a two-dimensional (2-D) sketch, remains the most effective portrait of a "small molecule" or chemical reaction. However, such structural diagrams, as for any other core data, cannot be used in speech (and should not be used in free text). "Good annotation practice" for biological databases is to use either consistent and widely recognised terminology or unique identifiers from a dedicated database to refer to the molecule of interest. Ideally, scientists should use terminology that is both pronounceable and meaningful. Thus, a viable solution for a bioinformatician is to use a definitive controlled vocabulary of biochemical compounds and reactions, which contains both systematic and common names. In addition, chemical ontologies provide a means for placing entities of interest into wider chemical, biological or medical contexts. We present some challenges and achievements in the standardisation of chemical language in biological databases, with emphasis on three aspects of annotation: 1. good drawing practice: how to draw unambiguous 2-D diagrams; 2. good naming practice: how to give most appropriate names; and 3. good ontology practice: how to link the entity of interest by defined logical relationships to other entities.
二维草图形式的结构图仍然是“小分子”或化学反应最有效的呈现方式。然而,与任何其他核心数据一样,此类结构图不能用于口语(也不应在自由文本中使用)。生物数据库的“良好注释规范”是使用一致且广泛认可的术语或来自专用数据库的唯一标识符来指代感兴趣的分子。理想情况下,科学家应使用既便于发音又有意义的术语。因此,对于生物信息学家来说,一个可行的解决方案是使用生化化合物和反应的权威受控词汇表,其中包含系统名称和常用名称。此外,化学本体论提供了一种将感兴趣的实体置于更广泛的化学、生物学或医学背景中的方法。我们介绍了生物数据库中化学语言标准化方面的一些挑战和成果,重点关注注释的三个方面:1. 良好的绘图规范:如何绘制清晰明确的二维图;2. 良好的命名规范:如何给出最合适的名称;3. 良好的本体论规范:如何通过定义的逻辑关系将感兴趣的实体与其他实体联系起来。