Chen Wenyi, Achakkagari Sai Reddy, Strömvik Martina
Department of Plant Science, McGill University, Sainte-Anne-de-Bellevue, QC, Canada.
Front Plant Sci. 2022 Nov 3;13:1011948. doi: 10.3389/fpls.2022.1011948. eCollection 2022.
Plastome sequence data is most often extracted from plant whole genome sequencing data and need to be assembled and annotated separately from the nuclear genome sequence. In projects comprising multiple genomes, it is labour intense to individually process the plastomes as it requires many steps and software. This study developed - an automated pipeline for both assembly and annotation of plastomes, with the scope of the researcher being able to load whole genome sequence data with minimal manual input, and therefore a faster runtime. The main structure of the current automated pipeline includes trimming of adaptor and low-quality sequences using , plastome assembly using , standardization and quality checking of the assembled genomes through a custom script utilizing and , annotation of the assembled genomes using , and finally generating the required files for NCBI GenBank submissions. The pipeline is demonstrated with 12 potato accessions and three soybean accessions.
质体基因组序列数据通常是从植物全基因组测序数据中提取的,需要与核基因组序列分开进行组装和注释。在包含多个基因组的项目中,单独处理质体基因组需要耗费大量人力,因为这需要许多步骤和软件。本研究开发了一种用于质体基因组组装和注释的自动化流程,研究人员只需进行最少的手动输入就能加载全基因组序列数据,从而实现更快的运行时间。当前自动化流程的主要结构包括使用 修剪接头和低质量序列,使用 进行质体基因组组装,通过一个利用 和 的自定义脚本对组装好的基因组进行标准化和质量检查,使用 对组装好的基因组进行注释,最后生成提交给NCBI GenBank所需的文件。该流程在12个马铃薯种质和3个大豆种质上进行了演示。