Higdon Andrea L, Won Nathan H, Brar Gloria A
bioRxiv. 2023 Jul 14:2023.07.13.548938. doi: 10.1101/2023.07.13.548938.
Genome-wide measurements of ribosome occupancy on mRNA transcripts have enabled global empirical identification of translated regions. These approaches have revealed an unexpected diversity of protein products, but high-confidence identification of new coding regions that entirely overlap annotated coding regions - including those that encode truncated protein isoforms - has remained challenging. Here, we develop a sensitive and robust algorithm focused on identifying N-terminally truncated proteins genome-wide, identifying 388 truncated protein isoforms, a more than 30-fold increase in the number known in budding yeast. We perform extensive experimental validation of these truncated proteins and define two general classes. The first set lack large portions of the annotated protein sequence and tend to be produced from a truncated transcript. We show two such cases, Yap5 and Pus1 , to have condition-specific regulation and functions that appear distinct from their respective annotated isoforms. The second set of N-terminally truncated proteins lack only a small region of the annotated protein and are less likely to be regulated by an alternative transcript isoform. Many localize to different subcellular compartments than their annotated counterpart, representing a common strategy for achieving dual localization of otherwise functionally identical proteins.
对mRNA转录本上核糖体占据情况进行全基因组测量,使得能够从全局层面凭经验鉴定翻译区域。这些方法揭示了蛋白质产物出人意料的多样性,但要高可信度地鉴定与注释编码区域完全重叠的新编码区域(包括那些编码截短蛋白异构体的区域)仍然具有挑战性。在此,我们开发了一种灵敏且稳健的算法,专注于全基因组范围内鉴定N端截短的蛋白质,共鉴定出388种截短蛋白异构体,比芽殖酵母中已知的数量增加了30多倍。我们对这些截短蛋白进行了广泛的实验验证,并定义了两个一般类别。第一类缺少注释蛋白序列的大部分,且往往由截短的转录本产生。我们展示了两个这样的例子,即Yap5和Pus1,它们具有条件特异性调控和功能,这些调控和功能似乎与其各自注释的异构体不同。第二类N端截短蛋白仅缺少注释蛋白的一小部分区域,受可变转录本异构体调控的可能性较小。许多这类蛋白定位于与其注释对应物不同的亚细胞区室,这代表了一种实现功能相同蛋白双定位的常见策略。