ICAR-National Bureau of Plant Genetic Resources (ICAR-NBPGR), New Delhi, India.
ICAR- Indian Agricultural Research Institute (ICAR-IARI), New Delhi, India.
Brief Bioinform. 2022 Sep 20;23(5). doi: 10.1093/bib/bbac348.
Maintaining duplicate germplasms in genebanks hampers effective conservation and utilization of genebank resources. The redundant germplasm adds to the cost of germplasm conservation by requiring a large proportion of the genebank financial resources towards conservation rather than enriching the diversity. Besides, genome-wide-association analysis using an association panel with over-represented germplasms can be biased resulting in spurious marker-trait associations. The conventional methods of germplasm duplicate removal using passport information suffer from incomplete or missing passport information and data handling errors at various stages of germplasm enrichment. This limitation is less likely in the case of genotypic data. Therefore, we developed a web-based tool, Germplasm Duplicate Identification and Removal Tool (G-DIRT), which allows germplasm duplicate identification based on identity-by-state analysis using single-nucleotide polymorphism genotyping information along with pre-processing of genotypic data. A homozygous genotypic difference threshold of 0.1% for germplasm duplicates has been determined using tetraploid wheat genotypic data with 94.97% of accuracy. Based on the genotypic difference, the tool also builds a dendrogram that can visually depict the relationship between genotypes. To overcome the constraint of high-dimensional genotypic data, an offline version of G-DIRT in the interface of R has also been developed. The G-DIRT is expected to help genebank curators, breeders and other researchers across the world in identifying germplasm duplicates from the global genebank collections by only using the easily sharable genotypic data instead of physically exchanging the seeds or propagating materials. The web server will complement the existing methods of germplasm duplicate identification based on passport or phenotypic information being freely accessible at http://webtools.nbpgr.ernet.in/gdirt/.
基因库中保存重复的种质资源会妨碍基因库资源的有效保护和利用。这些冗余的种质资源需要大量的基因库资金用于保存,而不是丰富多样性,从而增加了种质资源保存的成本。此外,使用代表性过高的种质关联群体进行全基因组关联分析可能会产生偏差,导致虚假的标记-性状关联。传统的利用护照信息去除种质重复的方法存在护照信息不完整或缺失以及在种质富集的各个阶段数据处理错误的问题。这种局限性在基因型数据的情况下不太可能出现。因此,我们开发了一个基于网络的工具,即种质重复识别和去除工具(G-DIRT),它允许基于身份状态分析的种质重复识别,同时使用单核苷酸多态性基因分型信息预处理基因型数据。使用四倍体小麦基因型数据确定了种质重复的纯合基因型差异阈值为 0.1%,准确率为 94.97%。基于基因型差异,该工具还构建了一个可以直观地描绘基因型之间关系的系统发育树。为了克服高维基因型数据的限制,我们还在 R 界面中开发了 G-DIRT 的离线版本。G-DIRT 有望帮助世界各地的基因库管理者、育种家和其他研究人员通过仅使用易于共享的基因型数据,而不是物理交换种子或繁殖材料,从全球基因库收藏中识别种质重复。该网络服务器将补充现有的基于护照或表型信息的种质重复识别方法,这些方法可在 http://webtools.nbpgr.ernet.in/gdirt/ 上免费获取。