Research Programme on Biomedical Informatics (GRIB),Hospital del Mar Research Institute, Barcelona, Spain.
Catalan Institute for Research and Advanced Studies (ICREA), Barcelona, Spain.
Genome Biol Evol. 2024 Jul 3;16(7). doi: 10.1093/gbe/evae126.
During evolution, new open reading frames (ORFs) with the potential to give rise to novel proteins continuously emerge. A recent compilation of noncanonical ORFs with translation signatures in humans has identified thousands of cases with a putative de novo origin. However, it is not known which is their distribution in the population. Are they universally translated? Here, we use ribosome profiling data from 65 lymphoblastoid cell lines from individuals of Yoruba origin to investigate this question. We identify 2,587 de novo ORFs translated in at least one of the cell lines. In line with their de novo origin, the encoded proteins tend to be smaller than 100 amino acids and encode positively charged proteins. We observe that the de novo ORFs are more polymorphic in the population than the set of canonical proteins, with a substantial fraction of them being translated in only some of the cell lines. Remarkably, this difference remains significant after controlling for differences in the translation levels. These results suggest that variations in the level translation of de novo ORFs could be a relevant source of intraspecies phenotypic diversity in humans.
在进化过程中,具有产生新蛋白质潜力的新开放阅读框(ORFs)不断出现。最近在人类中具有翻译特征的非规范 ORFs 的综合汇编已经确定了数千个具有新起源的案例。然而,目前尚不清楚它们在人群中的分布情况。它们是否普遍被翻译?在这里,我们使用来自具有约鲁巴血统的 65 个人类淋巴母细胞系的核糖体分析数据来研究这个问题。我们在至少一个细胞系中鉴定出了 2587 个新翻译的 ORFs。与它们的新起源一致,编码的蛋白质往往小于 100 个氨基酸,并且编码带正电荷的蛋白质。我们观察到,新的 ORFs 在人群中的多态性高于规范蛋白组,其中相当一部分只在一些细胞系中被翻译。值得注意的是,在控制翻译水平的差异后,这种差异仍然显著。这些结果表明,新的 ORFs 翻译水平的变化可能是人类种内表型多样性的一个重要来源。