Toda Nicholas, Rustenholz Camille, Baud Agnès, Le Paslier Marie-Christine, Amselem Joelle, Merdinoglu Didier, Faivre-Rampant Patricia
Université Paris-Saclay, INRAE, Etude du Polymorphisme des Génomes Végétaux (EPGV), 91000 Evry, France.
Université de Strasbourg, INRAE, SVQV UMR A 1131, 68000 Colmar, France.
Genes (Basel). 2020 Mar 20;11(3):333. doi: 10.3390/genes11030333.
Although there are a number of bioinformatic tools to identify plant nucleotide-binding leucine-rich repeat (NLR) disease resistance genes based on conserved protein sequences, only a few of these tools have attempted to identify disease resistance genes that have not been annotated in the genome. The overall goal of the NLGenomeSweeper pipeline is to annotate NLR disease resistance genes, including RPW8, in the genome assembly with high specificity and a focus on complete functional genes. This is based on the identification of the complete NB-ARC domain, the most conserved domain of NLR genes, using the BLAST suite. In this way, the tool has a high specificity for complete genes and relatively intact pseudogenes. The tool returns all candidate NLR gene locations as well as InterProScan ORF and domain annotations for manual curation of the gene structure.
尽管有许多生物信息学工具可基于保守的蛋白质序列来鉴定植物核苷酸结合富含亮氨酸重复序列(NLR)抗病基因,但其中只有少数工具尝试鉴定基因组中未注释的抗病基因。NLGenomeSweeper流程的总体目标是以高特异性并专注于完整功能基因来注释基因组组装中的NLR抗病基因,包括RPW8。这是基于使用BLAST套件鉴定NLR基因最保守的结构域——完整的NB-ARC结构域。通过这种方式,该工具对完整基因和相对完整的假基因具有高特异性。该工具会返回所有候选NLR基因位置以及InterProScan开放阅读框和结构域注释,以便人工策划基因结构。