Muenzen Kathleen, Monroy Jenna, Finseth Findley R
Keck Science Department, Claremont McKenna, Pitzer, and Scripps Colleges, Claremont, CA 91711.
Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA 98105.
G3 (Bethesda). 2019 Apr 9;9(4):1103-1115. doi: 10.1534/g3.118.200714.
The protein titin plays a key role in vertebrate muscle where it acts like a giant molecular spring. Despite its importance and conservation over vertebrate evolution, a lack of high quality annotations in non-model species makes comparative evolutionary studies of titin challenging. The PEVK region of titin-named for its high proportion of Pro-Glu-Val-Lys amino acids-is particularly difficult to annotate due to its abundance of alternatively spliced isoforms and short, highly repetitive exons. To understand PEVK evolution across mammals, we developed a bioinformatics tool, PEVK_Finder, to annotate PEVK exons from genomic sequences of titin and applied it to a diverse set of mammals. PEVK_Finder consistently outperforms standard annotation tools across a broad range of conditions and improves annotations of the PEVK region in non-model mammalian species. We find that the PEVK region can be divided into two subregions (PEVK-N, PEVK-C) with distinct patterns of evolutionary constraint and divergence. The bipartite nature of the PEVK region has implications for titin diversification. In the PEVK-N region, certain exons are conserved and may be essential, but natural selection also acts on particular codons. In the PEVK-C, exons are more homogenous and length variation of the PEVK region may provide the raw material for evolutionary adaptation in titin function. The PEVK-C region can be further divided into a highly repetitive region (PEVK-CA) and one that is more variable (PEVK-CB). Taken together, we find that the very complexity that makes titin a challenge for annotation tools may also promote evolutionary adaptation.
肌联蛋白在脊椎动物肌肉中起着关键作用,它就像一个巨大的分子弹簧。尽管肌联蛋白在脊椎动物进化过程中具有重要性且高度保守,但非模式物种中缺乏高质量注释使得对其进行比较进化研究具有挑战性。肌联蛋白的PEVK区域——因其富含脯氨酸-谷氨酸-缬氨酸-赖氨酸氨基酸而得名——由于其大量的可变剪接异构体和短的、高度重复的外显子,特别难以注释。为了了解哺乳动物中PEVK的进化,我们开发了一种生物信息学工具PEVK_Finder,用于从肌联蛋白的基因组序列中注释PEVK外显子,并将其应用于多种哺乳动物。在广泛的条件下,PEVK_Finder始终优于标准注释工具,并改善了非模式哺乳动物物种中PEVK区域的注释。我们发现,PEVK区域可分为两个子区域(PEVK-N、PEVK-C),具有不同的进化约束和分歧模式。PEVK区域的二分性质对肌联蛋白的多样化有影响。在PEVK-N区域,某些外显子是保守的,可能是必不可少的,但自然选择也作用于特定的密码子。在PEVK-C区域,外显子更为均匀,PEVK区域的长度变异可能为肌联蛋白功能的进化适应提供原材料。PEVK-C区域可进一步分为一个高度重复的区域(PEVK-CA)和一个更具变异性的区域(PEVK-CB)。综上所述,我们发现正是这种使肌联蛋白对注释工具构成挑战的复杂性,也可能促进进化适应。