Centro Andaluz de Biología del Desarrollo, Consejo Superior de Investigaciones Científicas, Campus Universidad Pablo de Olavide, Seville, Spain.
CNRS, Inserm, CHU Lille, Institut Pasteur de Lille, U1019-UMR 9017-CIIL-Centre d'Infection et d'Immunité de Lille, University of Lille, Lille, France.
PLoS Comput Biol. 2024 Sep 16;20(9):e1012459. doi: 10.1371/journal.pcbi.1012459. eCollection 2024 Sep.
An often-overlooked aspect of biology is formed by the outliers of the protein length distribution, specifically those proteins with more than 5000 amino acids, which we refer to as huge proteins (HPs). By examining UniprotKB, we discovered more than 41 000 HPs throughout the tree of life, with the majority found in eukaryotes. Notably, the phyla with the highest propensity for HPs are Apicomplexa and Fornicata. Moreover, we observed that certain bacteria, such as Elusimicrobiota or Planctomycetota, have a higher tendency for encoding HPs, even more than the average eukaryote. To investigate if these macro-polypeptides represent "real" proteins, we explored several indirect metrics. Additionally, orthology analyses reveals thousands of clusters of homologous sequences of HPs, revealing functional groups related to key cellular processes such as cytoskeleton organization and functioning as chaperones or as E3-ubiquitin ligases in eukaryotes. In the case of bacteria, the major clusters have functions related to non-ribosomomal peptide synthesis/polyketide synthesis, followed by pathogen-host attachment or recognition surface proteins. Further exploration of the annotations for each HPs supported the previously identified functional groups. These findings underscore the need for further investigation of the cellular and ecological roles of these HPs and their potential impact on biology and biotechnology.
生物学中一个经常被忽视的方面是由蛋白质长度分布的离群值形成的,特别是那些长度超过 5000 个氨基酸的蛋白质,我们称之为巨大蛋白质(HPs)。通过检查 UniprotKB,我们在生命之树中发现了超过 41000 种 HPs,其中大部分存在于真核生物中。值得注意的是,具有最高 HPs 倾向的门是顶复门和纤毛门。此外,我们观察到某些细菌,如 Elusimicrobiota 或 Planctomycetota,编码 HPs 的倾向更高,甚至超过平均真核生物。为了研究这些大的多肤是否代表“真正的”蛋白质,我们探索了几种间接指标。此外,同源性分析揭示了数千个 HPs 同源序列簇,揭示了与关键细胞过程相关的功能组,如细胞骨架组织和在真核生物中作为伴侣或 E3-泛素连接酶的功能。对于细菌,主要的簇具有与非核糖体肽合成/聚酮合成相关的功能,其次是病原体-宿主附着或识别表面蛋白。对每个 HPs 的注释的进一步探索支持了先前确定的功能组。这些发现强调了需要进一步研究这些 HPs 的细胞和生态作用及其对生物学和生物技术的潜在影响。