Zhang Jun, Hua Zhengshuang, Huang Zebo, Chen QiZhu, Long Qingyun, Craik David J, Baker Alan J M, Shu Wensheng, Liao Bin
School of Biosciences and Biopharmaceutics, Guangdong Province Key Laboratory for Biotechnology Drug Candidates, Guangdong Pharmaceutical University, Guangzhou, 510006, China,
Planta. 2015 Apr;241(4):929-40. doi: 10.1007/s00425-014-2229-5. Epub 2014 Dec 21.
Two high-throughput tools harvest hundreds of novel cyclotides and analogues in plants. Cyclotides are gene-encoded backbone-cyclized polypeptides displaying a diverse range of bioactivities associated with plant defense. However, genome-scale or database-scale evaluations of cyclotides have been rare so far. Here, a novel time-efficient Perl program, CyPerl, was developed for searching cyclotides from predicted ORFs of 34 available plant genomes and existing plant protein sequences from Genbank databases. CyPerl-isolated sequences were further analyzed by removing repeats, evaluating their cysteine-distributed regions (CDRs) and comparing with CyBase-collected cyclotides in a user-friendly Excel (Microsoft Office) template, CyExcel. After genome-screening, 186 ORFs containing 145 unique cyclotide analogues were identified by CyPerl and CyExcel from 30 plant genomes tested from 10 plant families. Phaseolus vulgaris and Zea mays were the richest two species containing cyclotide analogues in the plants tested. After screening protein databases, 266 unique cyclotides and analogues were identified from seven plant families. By merging with 288 unique CyBase-listed cyclotides, 510 unique cyclotides and analogues were obtained from 13 plant families. In total, seven novel plant families containing cyclotide analogues and 202 novel cyclotide analogues were identified in this study. This study has established two Blast-independent tools for screening cyclotides from plant genomes and protein databases, and has also significantly widened the plant distribution and sequence diversity of cyclotides and their analogues.
两种高通量工具从植物中收获了数百种新型环肽及其类似物。环肽是基因编码的骨架环化多肽,具有与植物防御相关的多种生物活性。然而,迄今为止,对环肽进行基因组规模或数据库规模的评估还很少见。在此,开发了一种新颖的高效Perl程序CyPerl,用于从34个可用植物基因组的预测开放阅读框(ORF)和Genbank数据库中的现有植物蛋白质序列中搜索环肽。通过去除重复序列、评估其半胱氨酸分布区域(CDR)并在用户友好的Excel(Microsoft Office)模板CyExcel中与CyBase收集的环肽进行比较,对CyPerl分离的序列进行进一步分析。经过基因组筛选,通过CyPerl和CyExcel从10个植物科测试的30个植物基因组中鉴定出186个包含145种独特环肽类似物的开放阅读框。菜豆和玉米是测试植物中含环肽类似物最丰富的两个物种。在筛选蛋白质数据库后,从7个植物科中鉴定出266种独特的环肽及其类似物。通过与CyBase列出的288种独特环肽合并,从13个植物科中获得了510种独特的环肽及其类似物。在本研究中,总共鉴定出7个含有环肽类似物的新植物科和202种新型环肽类似物。本研究建立了两种独立于Blast的工具,用于从植物基因组和蛋白质数据库中筛选环肽,同时也显著拓宽了环肽及其类似物的植物分布范围和序列多样性。