CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; Joint Center for Genomics Research (JCGR), King Abdulaziz City for Science and Technology and Chinese Academy of Sciences, Riyadh 11442, Saudi Arabia; Grail Scientific Co. Ltd., Shenyang 110000, China.
CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China.
Genomics Proteomics Bioinformatics. 2018 Oct;16(5):373-381. doi: 10.1016/j.gpb.2018.03.006. Epub 2018 Dec 21.
The rapid development of high-throughput sequencing technologies has led to a dramatic decrease in the money and time required for de novo genome sequencing or genome resequencing projects, with new genome sequences constantly released every week. Among such projects, the plethora of updated genome assemblies induces the requirement of version-dependent annotation files and other compatible public dataset for downstream analysis. To handle these tasks in an efficient manner, we developed the reference-based genome assembly and annotation tool (RGAAT), a flexible toolkit for resequencing-based consensus building and annotation update. RGAAT can detect sequence variants with comparable precision, specificity, and sensitivity to GATK and with higher precision and specificity than Freebayes and SAMtools on four DNA-seq datasets tested in this study. RGAAT can also identify sequence variants based on cross-cultivar or cross-version genomic alignments. Unlike GATK and SAMtools/BCFtools, RGAAT builds the consensus sequence by taking into account the true allele frequency. Finally, RGAAT generates a coordinate conversion file between the reference and query genomes using sequence variants and supports annotation file transfer. Compared to the rapid annotation transfer tool (RATT), RGAAT displays better performance characteristics for annotation transfer between different genome assemblies, strains, and species. In addition, RGAAT can be used for genome modification, genome comparison, and coordinate conversion. RGAAT is available at https://sourceforge.net/projects/rgaat/ and https://github.com/wushyer/RGAAT_v2 at no cost.
高通量测序技术的快速发展使得从头基因组测序或重测序项目所需的资金和时间大幅减少,每周都有新的基因组序列发布。在这些项目中,大量更新的基因组组装导致需要版本依赖的注释文件和其他兼容的公共数据集进行下游分析。为了高效地处理这些任务,我们开发了基于参考的基因组组装和注释工具(RGAAT),这是一个用于基于重测序的共识构建和注释更新的灵活工具包。RGAAT 可以检测到具有可比精度、特异性和敏感性的序列变体,与 GATK 相比,在本研究中测试的四个 DNA-seq 数据集上的精度和特异性更高,与 Freebayes 和 SAMtools 相比精度和特异性更高。RGAAT 还可以基于跨品种或跨版本的基因组比对来识别序列变体。与 GATK 和 SAMtools/BCFtools 不同,RGAAT 通过考虑真实等位基因频率来构建共识序列。最后,RGAAT 使用序列变体生成参考基因组和查询基因组之间的坐标转换文件,并支持注释文件传输。与快速注释转移工具(RATT)相比,RGAAT 在不同基因组组装、菌株和物种之间的注释转移方面表现出更好的性能特征。此外,RGAAT 可用于基因组修饰、基因组比较和坐标转换。RGAAT 可在 https://sourceforge.net/projects/rgaat/ 和 https://github.com/wushyer/RGAAT_v2 免费获取。
Genomics Proteomics Bioinformatics. 2018-12-21
Bioinformatics. 2014-6-15
Nucleic Acids Res. 2011-2-8
Methods Mol Biol. 2015
Bioinformatics. 2014-6-15
Bioinformatics. 2017-2-15
BMC Bioinformatics. 2020-7-20
Microbiol Resour Announc. 2022-12-15
Genomics Proteomics Bioinformatics. 2022-2
Cancer Res. 2021-9-1
Nucleic Acids Res. 2018-1-4
Genomics Proteomics Bioinformatics. 2017-2
Database (Oxford). 2017-1-1
Nat Genet. 2014-9-21
BMC Genomics. 2014-4-5
Bioinformatics. 2014-4-1
Nat Commun. 2013
Nat Methods. 2012-3-4
BMC Bioinformatics. 2011-12-22