Suppr超能文献

RGAugury:一种用于全基因组预测植物抗性基因类似物(RGAs)的流程

RGAugury: a pipeline for genome-wide prediction of resistance gene analogs (RGAs) in plants.

作者信息

Li Pingchuan, Quan Xiande, Jia Gaofeng, Xiao Jin, Cloutier Sylvie, You Frank M

机构信息

Morden Research and Development Centre, Agriculture and Agri-Food Canada, Morden, MB, R6M 1Y5, Canada.

University of Saskatchewan, Saskatoon, SK, S7N 5A8, Canada.

出版信息

BMC Genomics. 2016 Nov 2;17(1):852. doi: 10.1186/s12864-016-3197-x.

Abstract

BACKGROUND

Resistance gene analogs (RGAs), such as NBS-encoding proteins, receptor-like protein kinases (RLKs) and receptor-like proteins (RLPs), are potential R-genes that contain specific conserved domains and motifs. Thus, RGAs can be predicted based on their conserved structural features using bioinformatics tools. Computer programs have been developed for the identification of individual domains and motifs from the protein sequences of RGAs but none offer a systematic assessment of the different types of RGAs. A user-friendly and efficient pipeline is needed for large-scale genome-wide RGA predictions of the growing number of sequenced plant genomes.

RESULTS

An integrative pipeline, named RGAugury, was developed to automate RGA prediction. The pipeline first identifies RGA-related protein domains and motifs, namely nucleotide binding site (NB-ARC), leucine rich repeat (LRR), transmembrane (TM), serine/threonine and tyrosine kinase (STTK), lysin motif (LysM), coiled-coil (CC) and Toll/Interleukin-1 receptor (TIR). RGA candidates are identified and classified into four major families based on the presence of combinations of these RGA domains and motifs: NBS-encoding, TM-CC, and membrane associated RLP and RLK. All time-consuming analyses of the pipeline are paralleled to improve performance. The pipeline was evaluated using the well-annotated Arabidopsis genome. A total of 98.5, 85.2, and 100 % of the reported NBS-encoding genes, membrane associated RLPs and RLKs were validated, respectively. The pipeline was also successfully applied to predict RGAs for 50 sequenced plant genomes. A user-friendly web interface was implemented to ease command line operations, facilitate visualization and simplify result management for multiple datasets.

CONCLUSIONS

RGAugury is an efficiently integrative bioinformatics tool for large scale genome-wide identification of RGAs. It is freely available at Bitbucket: https://bitbucket.org/yaanlpc/rgaugury .

摘要

背景

抗性基因类似物(RGAs),如编码NBS的蛋白、类受体蛋白激酶(RLKs)和类受体蛋白(RLPs),是潜在的R基因,它们包含特定的保守结构域和基序。因此,可以使用生物信息学工具根据其保守结构特征预测RGAs。已经开发了计算机程序用于从RGAs的蛋白质序列中识别单个结构域和基序,但没有一个能对不同类型的RGAs进行系统评估。对于越来越多已测序植物基因组的大规模全基因组RGA预测,需要一个用户友好且高效的流程。

结果

开发了一个名为RGAugury的综合流程来自动化RGA预测。该流程首先识别与RGA相关的蛋白质结构域和基序,即核苷酸结合位点(NB-ARC)、富含亮氨酸重复序列(LRR)、跨膜结构域(TM)、丝氨酸/苏氨酸和酪氨酸激酶(STTK)、溶素基序(LysM)、卷曲螺旋结构域(CC)和Toll/白细胞介素-1受体结构域(TIR)。根据这些RGA结构域和基序的组合情况识别RGA候选基因,并将其分为四个主要家族:编码NBS的基因、TM-CC基因以及膜相关的RLP和RLK基因。该流程中所有耗时的分析都并行进行以提高性能。使用注释良好的拟南芥基因组对该流程进行了评估。分别验证了已报道的编码NBS的基因、膜相关RLP和RLK基因中的98.5%、85.2%和100%。该流程还成功应用于预测50个已测序植物基因组的RGAs。实现了一个用户友好的网络界面,以简化命令行操作、便于可视化并简化多个数据集的结果管理。

结论

RGAugury是一种用于大规模全基因组识别RGAs的高效综合生物信息学工具。它可在Bitbucket上免费获取:https://bitbucket.org/yaanlpc/rgaugury

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd4e/5093994/c2cf10dffd9b/12864_2016_3197_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验