Institute of Biomedical Informatics, National Yang-Ming University, Taipei, 112, Taiwan.
Institute of Biomedical Sciences, Academia Sinica, Taipei, 115, Taiwan.
Sci Rep. 2017 Oct 27;7(1):14210. doi: 10.1038/s41598-017-13297-0.
Proteome-scale bioinformatics research is increasingly conducted as the number of completely sequenced genomes increases, but analysis of protein domains (PDs) usually relies on similarity in their amino acid sequences and/or three-dimensional structures. Here, we present results from a bi-clustering analysis on presence/absence data for 6,580 unique PDs in 2,134 species with a sequenced genome, thus covering a complete set of proteins, for the three superkingdoms of life, Bacteria, Archaea, and Eukarya. Our analysis revealed eight distinctive PD clusters, which, following an analysis of enrichment of Gene Ontology functions and CATH classification of protein structures, were shown to exhibit structural and functional properties that are taxa-characteristic. For examples, the largest cluster is ubiquitous in all three superkingdoms, constituting a set of 1,472 persistent domains created early in evolution and retained in living organisms and characterized by basic cellular functions and ancient structural architectures, while an Archaea and Eukarya bi-superkingdom cluster suggests its PDs may have existed in the ancestor of the two superkingdoms, and others are single superkingdom- or taxa (e.g. Fungi)-specific. These results contribute to increase our appreciation of PD diversity and our knowledge of how PDs are used in species, yielding implications on species evolution.
蛋白质组规模的生物信息学研究随着完全测序基因组数量的增加而逐渐增多,但蛋白质结构域 (PD) 的分析通常依赖于其氨基酸序列和/或三维结构的相似性。在这里,我们根据 2134 个具有测序基因组的物种中 6580 个独特 PD 的存在/缺失数据进行了双聚类分析,从而涵盖了生命的三个超级界(细菌、古菌和真核生物)中所有蛋白质的完整集合。我们的分析揭示了八个独特的 PD 聚类,通过对基因本体功能的富集分析和蛋白质结构的 CATH 分类进行分析,这些聚类显示出具有 taxon 特征的结构和功能特性。例如,最大的聚类在所有三个超级界中普遍存在,由 1472 个持久域组成,这些域是在进化早期创建的,并保留在生物体中,其特点是具有基本的细胞功能和古老的结构架构,而古菌和真核生物的双超级界聚类表明,其 PD 可能存在于这两个超级界的祖先中,其他则是单一超级界或 taxon(例如真菌)特异性的。这些结果有助于增加我们对 PD 多样性的认识和我们对 PD 在物种中如何使用的了解,从而对物种进化产生影响。