Collado-Vides Julio, Gaudet Pascale, de Lorenzo Víctor
Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Mexico.
Department of Biomedical Engineering, Boston University, Boston, MA, United States.
Front Physiol. 2022 Feb 28;13:815874. doi: 10.3389/fphys.2022.815874. eCollection 2022.
Knowledge of biological organisms at the molecular level that has been gathered is now organized into databases, often within ontological frameworks. To enable computational comparisons of annotations across different genomes and organisms, controlled vocabularies have been essential, as is the case in the functional annotation classifications used for bacteria, such as MultiFun and the more widely used Gene Ontology. The function of individual gene products as well as the processes in which collections of them participate constitute a wealth of classes that describe the biological role of gene products in a large number of organisms in the three kingdoms of life. In this contribution, we highlight from a qualitative perspective some limitations of these frameworks and discuss challenges that need to be addressed to bridge the gap between annotation as currently captured by ontologies and databases and our understanding of the basic principles in the organization and functioning of organisms; we illustrate these challenges with some examples in bacteria. We hope that raising awareness of these issues will encourage users of Gene Ontology and similar ontologies to be careful about data interpretation and lead to improved data representation.
目前已收集到的关于生物有机体分子水平的知识被组织成数据库,这些数据库通常处于本体框架内。为了实现跨不同基因组和生物体的注释的计算比较,受控词汇表至关重要,细菌功能注释分类中使用的情况就是如此,例如MultiFun和使用更广泛的基因本体论。单个基因产物的功能以及它们所参与的过程构成了丰富的类别,这些类别描述了基因产物在生命三界中大量生物体中的生物学作用。在本论文中,我们从定性的角度强调了这些框架的一些局限性,并讨论了为弥合本体和数据库目前所捕获的注释与我们对生物体组织和功能基本原理的理解之间的差距而需要解决的挑战;我们用细菌中的一些例子来说明这些挑战。我们希望提高对这些问题的认识将鼓励基因本体论及类似本体的用户在数据解释时谨慎行事,并改善数据表示。