Department of Plant Sciences, College of Agriculture and Bioresources, University of Saskatchewan, 51 Campus Drive, Saskatoon, SK, S7N 5A8, Canada.
Global Institute for Food Security, University of Saskatchewan, Saskatoon, SK, S7N 4L8, Canada.
BMC Genomics. 2022 Jul 23;23(1):534. doi: 10.1186/s12864-022-08735-x.
Ribosomally-synthesized cyclic peptides are widely found in plants and exhibit useful bioactivities for humans. The identification of cyclic peptide sequences and their precursor proteins is facilitated by the growing number of sequenced genomes. While previous research largely focused on the chemical diversity of these peptides across various species, there is little attention to a broader range of potential peptides that are not chemically identified.
A pioneering study was initiated to explore the genetic diversity of linusorbs, a group of cyclic peptides uniquely occurring in cultivated flax (Linum usitatissimum). Phylogenetic analysis clustered the 5 known linusorb precursor proteins into two clades and one singleton. Preliminary tBLASTn search of the published flax genome using the whole protein sequence as query could only retrieve its homologues within the same clade. This limitation was overcome using a profile-based mining strategy. After genome reannotation, a hidden Markov Model (HMM)-based approach identified 58 repeats homologous to the linusorb-embedded repeats in 8 novel proteins, implying that they share common ancestry with the linusorb-embedded repeats. Subsequently, we developed a customized profile composed of a random linusorb-like domain (LLD) flanked by 5 conserved sites and used it for string search of the proteome, which extracted 281 LLD-containing repeats (LLDRs) in 25 proteins. Comparative analysis of different repeat categories suggested that the 5 conserved flanking sites among the non-homologous repeats have undergone convergent evolution driven by functional selection.
The profile-based mining approach is suitable for analyzing repetitive sequences. The 25 LLDR proteins identified herein represent the potential diversity of cyclic peptides within the flax genome and lay a foundation for further studies on the functions and evolution of these protein tandem repeats.
核糖体合成的环肽广泛存在于植物中,对人类具有有用的生物活性。随着测序基因组数量的增加,环肽序列及其前体蛋白的鉴定变得更加容易。虽然以前的研究主要集中在不同物种中环肽的化学多样性上,但对于更广泛的潜在非化学鉴定肽类关注甚少。
一项开创性的研究旨在探索独特存在于栽培亚麻(Linum usitatissimum)中的环肽linusorb 的遗传多样性。系统发育分析将 5 种已知的 linusorb 前体蛋白聚类为 2 个分支和 1 个单节点。使用整个蛋白质序列作为查询,对已发表的亚麻基因组进行初步 tBLASTn 搜索,只能检索到与其同分支的同源物。使用基于轮廓的挖掘策略克服了这一限制。在基因组重新注释后,基于隐马尔可夫模型(HMM)的方法在 8 种新蛋白质中鉴定出 58 个与 linusorb 嵌入式重复序列同源的重复序列,这表明它们与 linusorb 嵌入式重复序列具有共同的祖先。随后,我们开发了一个由随机 linusorb 样结构域(LLD)和 5 个保守位点组成的定制轮廓,并将其用于蛋白质组的字符串搜索,从中提取了 25 种蛋白质中 281 个含有 LLD 的重复序列(LLDR)。对不同重复类别进行比较分析表明,非同源重复中 5 个保守的侧翼位点在功能选择驱动下发生了趋同进化。
基于轮廓的挖掘方法适合分析重复序列。本文鉴定的 25 种 LLDR 蛋白代表了亚麻基因组中环肽的潜在多样性,并为进一步研究这些蛋白质串联重复的功能和进化奠定了基础。