Suppr超能文献

基于系统发生的基因本体论联盟功能注释传播。

Phylogenetic-based propagation of functional annotations within the Gene Ontology consortium.

机构信息

Swiss Institute for Bioinformatics, CMU, 1 Rue Michel Servet, 1211 Geneva 4, Switzerland.

出版信息

Brief Bioinform. 2011 Sep;12(5):449-62. doi: 10.1093/bib/bbr042. Epub 2011 Aug 27.

Abstract

The goal of the Gene Ontology (GO) project is to provide a uniform way to describe the functions of gene products from organisms across all kingdoms of life and thereby enable analysis of genomic data. Protein annotations are either based on experiments or predicted from protein sequences. Since most sequences have not been experimentally characterized, most available annotations need to be based on predictions. To make as accurate inferences as possible, the GO Consortium's Reference Genome Project is using an explicit evolutionary framework to infer annotations of proteins from a broad set of genomes from experimental annotations in a semi-automated manner. Most components in the pipeline, such as selection of sequences, building multiple sequence alignments and phylogenetic trees, retrieving experimental annotations and depositing inferred annotations, are fully automated. However, the most crucial step in our pipeline relies on software-assisted curation by an expert biologist. This curation tool, Phylogenetic Annotation and INference Tool (PAINT) helps curators to infer annotations among members of a protein family. PAINT allows curators to make precise assertions as to when functions were gained and lost during evolution and record the evidence (e.g. experimentally supported GO annotations and phylogenetic information including orthology) for those assertions. In this article, we describe how we use PAINT to infer protein function in a phylogenetic context with emphasis on its strengths, limitations and guidelines. We also discuss specific examples showing how PAINT annotations compare with those generated by other highly used homology-based methods.

摘要

GO 项目的目标是提供一种统一的方式来描述来自所有生命领域的生物体的基因产物的功能,从而能够分析基因组数据。蛋白质注释要么基于实验,要么从蛋白质序列预测。由于大多数序列尚未经过实验表征,因此大多数可用的注释需要基于预测。为了尽可能做出准确的推断,GO 联盟的参考基因组项目正在使用明确的进化框架,以半自动化的方式从一组广泛的基因组中推断蛋白质的注释,这些基因组具有实验注释。该管道中的大多数组件,例如序列选择、构建多序列比对和系统发育树、检索实验注释和存储推断注释,都是完全自动化的。然而,我们管道中最关键的步骤依赖于专家生物学家的软件辅助策展。这个策展工具,系统发育注释和推断工具(PAINT),帮助策展人在蛋白质家族成员之间推断注释。PAINT 允许策展人准确地断言在进化过程中何时获得和失去功能,并记录证据(例如,实验支持的 GO 注释和包括同源性的系统发育信息)。在本文中,我们描述了如何在系统发育背景下使用 PAINT 推断蛋白质功能,重点介绍了其优势、限制和指南。我们还讨论了具体示例,展示了 PAINT 注释如何与其他高度使用的基于同源性的方法生成的注释进行比较。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e7ff/3178059/8054bca39e19/bbr042f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验