Suppr超能文献

用于蛋白质的稳健系统发育谱聚类

Robust phylogenetic profile clustering for proteins.

作者信息

Harrison Paul M

机构信息

Department of Biology, McGill University, Montreal, Quebec, Canada.

出版信息

PeerJ. 2025 Apr 28;13:e19370. doi: 10.7717/peerj.19370. eCollection 2025.

Abstract

BACKGROUND

Genes are continually formed and lost as a genome evolves. However, new genes may tend to appear during specific evolutionary epochs rather than others, or disappear together in a more recent organismal clade. Methods to identify gene origination might simply use the last common ancestor to contain an ortholog as the putative gene origination point, or use a heuristic threshold that allows for a certain amount of missing orthologs in the cohort of species examined. Here, to avoid such issues, an alternative approach based on the clustering of phylogenetic profiles is applied, and the results are examined for any evidence of epochal trends in gene origination, and associated trends in specific sequence traits or functional associations.

METHODS

A phylogenetic profile is simply an array indicating the presence or absence of a gene in a list of species. These profiles were compared and clustered to discern patterns in gene occurrences across >800 fungal species, centering the analysis on the budding yeast .

RESULTS

Clear epochs of gene origination were observed linked to the last common ancestors of and , and also to and earlier ancestors. These trends are independent of the proteome and genome-assembly quality of the underlying data. Clusters of phylogenetic profiles demonstrated some significant functional associations, such as to cellular spore formation and chromosome segregation in genes originating in . The phylogenetic profile clustering analysis enabled detection of parameter-independent trends in intrinsic disorder, prion-like composition and gene uniqueness as a function of epochal gene age. For example: new proteins with prion-like domains have arisen at a similar rate for most of fungal evolution centred on ; the most proteins with mild intrinsic disorder have appeared during the early epoch rather than more recently, and very recently formed genes are the least likely to be single-copy (., 'unique' yeast proteins).

CONCLUSIONS

For individual proteins, the profile cluster data generated here are useful for investigating experimental hypotheses, since they provide evidence for functional linkages that have yet to be discerned.

摘要

背景

随着基因组的进化,基因不断地形成和丢失。然而,新基因可能倾向于在特定的进化时期而非其他时期出现,或者在更近的生物类群中一起消失。识别基因起源的方法可能只是简单地将包含直系同源基因的最后一个共同祖先作为假定的基因起源点,或者使用一个启发式阈值,该阈值允许在所研究的物种群体中有一定数量的缺失直系同源基因。在这里,为了避免这些问题,应用了一种基于系统发育谱聚类的替代方法,并检查结果是否有基因起源的时代趋势以及特定序列特征或功能关联的相关趋势的证据。

方法

系统发育谱简单来说就是一个数组,表明一个基因在一系列物种中存在或不存在。比较并聚类这些谱,以识别超过800种真菌物种中基因出现的模式,分析以芽殖酵母为中心。

结果

观察到与酿酒酵母和粟酒裂殖酵母的最后共同祖先以及与白色念珠菌和更早祖先相关的明显的基因起源时期。这些趋势与基础数据的蛋白质组和基因组组装质量无关。系统发育谱聚类显示出一些显著的功能关联,例如与起源于酿酒酵母的基因中的细胞孢子形成和染色体分离有关。系统发育谱聚类分析能够检测到作为时代基因年龄函数的内在无序、类朊病毒组成和基因独特性的与参数无关的趋势。例如:在以酿酒酵母为中心的大多数真菌进化过程中,具有类朊病毒结构域的新蛋白质以相似的速率出现;大多数具有轻度内在无序的蛋白质出现在早期酿酒酵母时期而非更近时期,并且最近形成的基因最不可能是单拷贝的(即“独特的”酵母蛋白质)。

结论

对于单个蛋白质,这里生成的谱聚类数据可用于研究实验假设,因为它们为尚未识别的功能联系提供了证据。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc16/12045281/62f226b4fc5a/peerj-13-19370-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验