Suppr超能文献

利用分层系统发育谱预测蛋白质功能:应用于真核生物基因组的Gene3D系统发育调整器方法

Predicting protein function with hierarchical phylogenetic profiles: the Gene3D Phylo-Tuner method applied to eukaryotic genomes.

作者信息

Ranea Juan A G, Yeats Corin, Grant Alastair, Orengo Christine A

机构信息

Department of Biochemistry and Molecular Biology, University College London, London, United Kingdom.

出版信息

PLoS Comput Biol. 2007 Nov;3(11):e237. doi: 10.1371/journal.pcbi.0030237. Epub 2007 Oct 18.

Abstract

"Phylogenetic profiling" is based on the hypothesis that during evolution functionally or physically interacting genes are likely to be inherited or eliminated in a codependent manner. Creating presence-absence profiles of orthologous genes is now a common and powerful way of identifying functionally associated genes. In this approach, correctly determining orthology, as a means of identifying functional equivalence between two genes, is a critical and nontrivial step and largely explains why previous work in this area has mainly focused on using presence-absence profiles in prokaryotic species. Here, we demonstrate that eukaryotic genomes have a high proportion of multigene families whose phylogenetic profile distributions are poor in presence-absence information content. This feature makes them prone to orthology mis-assignment and unsuited to standard profile-based prediction methods. Using CATH structural domain assignments from the Gene3D database for 13 complete eukaryotic genomes, we have developed a novel modification of the phylogenetic profiling method that uses genome copy number of each domain superfamily to predict functional relationships. In our approach, superfamilies are subclustered at ten levels of sequence identity-from 30% to 100%-and phylogenetic profiles built at each level. All the profiles are compared using normalised Euclidean distances to identify those with correlated changes in their domain copy number. We demonstrate that two protein families will "auto-tune" with strong co-evolutionary signals when their profiles are compared at the similarity levels that capture their functional relationship. Our method finds functional relationships that are not detectable by the conventional presence-absence profile comparisons, and it does not require a priori any fixed criteria to define orthologous genes.

摘要

“系统发育谱分析”基于这样一种假设:在进化过程中,功能上或物理上相互作用的基因可能以相互依赖的方式被遗传或淘汰。创建直系同源基因的有无图谱是目前识别功能相关基因的一种常见且强大的方法。在这种方法中,正确确定直系同源关系作为识别两个基因功能等价性的一种手段,是一个关键且并非易事的步骤,这在很大程度上解释了为什么该领域以前的工作主要集中在原核生物物种中使用有无图谱。在这里,我们证明真核生物基因组中有很大比例的多基因家族,其系统发育谱分布在有无信息含量方面较差。这一特征使它们容易出现直系同源关系的错误分配,不适合基于标准图谱的预测方法。利用来自Gene3D数据库的13个完整真核生物基因组的CATH结构域分配,我们开发了一种系统发育谱分析方法的新颖改进,该方法使用每个结构域超家族的基因组拷贝数来预测功能关系。在我们的方法中,超家族在30%到100%的十个序列同一性水平上进行子聚类,并在每个水平上构建系统发育谱。使用归一化欧几里得距离比较所有的谱,以识别那些在其结构域拷贝数上具有相关变化的谱。我们证明,当在捕获其功能关系的相似性水平上比较两个蛋白质家族的谱时,它们将以强烈的共进化信号“自动调整”。我们的方法发现了传统有无图谱比较无法检测到的功能关系,并且它不需要先验地使用任何固定标准来定义直系同源基因。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/45aa/2098864/2090bdcb88c6/pcbi.0030237.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验