Research Center for Creative Partnerships, Ishinomaki Senshu University, Ishinomaki, Miyagi, Japan.
PeerJ. 2023 Nov 29;11:e16446. doi: 10.7717/peerj.16446. eCollection 2023.
The mitochondrial genomes (mitogenomes) of metazoans generally include the same set of protein-coding genes, which ensures the homology of mitochondrial genes between species. The mitochondrial genes are often used as reference data for species identification based on genetic data (DNA barcoding). The need for such reference data has been increasing due to the application of environmental DNA (eDNA) analysis for environmental assessments. Recently, the number of publicly available sequence reads obtained with next-generation sequencing (NGS) has been increasing in the public database (the NCBI Sequence Read Archive, SRA). Such freely available NGS reads would be promising sources for assembling mitochondrial protein-coding genes (mPCGs) of organisms whose mitochondrial genes are not available in GenBank. The present study aimed to assemble annelid mPCGs from raw data deposited in the SRA.
The recent progress in the classification of Annelida was briefly introduced. In the present study, the mPCGs of 32 annelid species of 19 families in clitellates and allies in Sedentaria (echiurans and polychaetes) were newly assembled from the reads deposited in the SRA. Assembly was performed with a recently published pipeline mitoRNA, which includes cycles of Bowtie2 mapping and Trinity assembly. Assembled mPCGs were deposited in GenBank as Third Party Data (TPA) data. A phylogenetic tree was reconstructed with maximum likelihood (ML) analysis, together with other mPCGs deposited in GenBank.
mPCG assembly was largely successful except for ; only four genes were detected from the assembled contigs of the species probably due to the reads targeting its parasite. Most genes were largely successfully obtained, whereas atp8, nad2, and nad4l were only successful in 22-24 species. The high nucleotide substitution rates of these genes might be relevant to the failure in the assembly although nad6, which showed a similarly high substitution rate, was successfully assembled. Although the phylogenetic positions of several lineages were not resolved in the present study, the phylogenetic relationships of some polychaetes and leeches that were not inferred by transcriptomes were well resolved probably due to a more dense taxon sampling than previous phylogenetic analyses based on transcriptomes. Although NGS data are generally better sources for resolving phylogenetic relationships of both higher and lower classifications, there are ensuring needs for specific loci of the mitochondrial genes for analyses that do not require high resolutions, such as DNA barcoding, eDNA, and phylogenetic analysis among lower taxa. Assembly from publicly available NGS reads would help design specific primers for the mitochondrial gene sequences of species, whose mitochondrial genes are hard to amplify by Sanger sequencing using universal primers.
后生动物的线粒体基因组(mitogenome)通常包含相同的蛋白质编码基因集,这确保了物种间线粒体基因的同源性。线粒体基因经常被用作基于遗传数据(DNA 条形码)进行物种鉴定的参考数据。由于环境 DNA(eDNA)分析在环境评估中的应用,对这种参考数据的需求一直在增加。最近,公共数据库(NCBI 序列读取档案,SRA)中使用下一代测序(NGS)获得的公开可用序列读取数量不断增加。这种免费提供的 NGS 读取可能是那些线粒体基因在 GenBank 中不可用的生物体组装线粒体蛋白质编码基因(mPCGs)的有前途的来源。本研究旨在从 SRA 中存储的原始数据中组装环节动物的 mPCGs。
简要介绍了环节动物分类的最新进展。在本研究中,从 SRA 中存储的读取中,新组装了 19 个科的 32 种环节动物的 mPCGs,这些科属于 Sedentaria(蛭形动物和多毛类)的环节动物。组装是使用最近发表的带有 Bowtie2 映射和 Trinity 组装的循环的 mitoRNA 进行的。组装的 mPCGs 作为第三方数据(TPA)数据存入 GenBank。使用最大似然(ML)分析与其他在 GenBank 中存入的 mPCGs 一起重建了系统发育树。
除了可能由于寄生虫的目标而仅检测到四个基因外,mPCG 组装大部分都很成功。大多数基因都能成功获得,而 atp8、nad2 和 nad4l 仅在 22-24 种物种中成功。这些基因的核苷酸替代率较高可能与组装失败有关,尽管替代率相似的 nad6 成功组装。尽管在本研究中,一些谱系的系统发育位置没有得到解决,但由于基于转录组的以前的系统发育分析,一些多毛类和蛭形动物的系统发育关系得到了很好的解决,这可能是由于分类群采样密度更高。尽管 NGS 数据通常是解决较高和较低分类群系统发育关系的更好来源,但对于不需要高分辨率的分析,例如 DNA 条形码、eDNA 和较低分类群的系统发育分析,仍需要线粒体基因的特定基因座。从公开可用的 NGS 读取中组装可以帮助设计物种线粒体基因序列的特定引物,这些引物很难通过使用通用引物的 Sanger 测序进行扩增。