Suppr超能文献

核糖体操纵子数据库:一个源自基因组组装的全长核糖体DNA操纵子数据库。

The Ribosomal Operon Database: A Full-Length rDNA Operon Database Derived From Genome Assemblies.

作者信息

Krabberød Anders K, Stokke Embla, Thoen Ella, Skrede Inger, Kauserud Håvard

机构信息

Department of Biosciences, Section for Genetics and Evolutionary Biology, University of Oslo, Oslo, Norway.

出版信息

Mol Ecol Resour. 2025 Jan;25(1):e14031. doi: 10.1111/1755-0998.14031. Epub 2024 Oct 21.

Abstract

Current rDNA reference sequence databases are tailored towards shorter DNA markers, such as parts of the 16/18S marker or the internally transcribed spacer (ITS) region. However, due to advances in long-read DNA sequencing technologies, longer stretches of the rDNA operon are increasingly used in environmental sequencing studies to increase the phylogenetic resolution. There is, therefore, a growing need for longer rDNA reference sequences. Here, we present the ribosomal operon database (ROD), which includes eukaryotic full-length rDNA operons fished from publicly available genome assemblies. Full-length operons were detected in 34.1% of the 34,701 examined eukaryotic genome assemblies from NCBI. In most cases (53.1%), more than one operon variant was detected, which can be due to intragenomic operon copy variability, allelic variation in non-haploid genomes, or technical errors from the sequencing and assembly process. The highest copy number found was 5947 in Zea mays. In total, 453,697 unique operons were detected, with 69,480 operon variant clusters remaining after intragenomic clustering at 99% sequence identity. The operon length varied extensively across eukaryotes, ranging from 4136 to 16,463 bp, which will lead to considerable polymerase chain reaction (PCR) bias during amplification of the entire operon. Clustering the full-length operons revealed that the different parts (i.e., 18S, 28S, and the hypervariable regions V4 and V9 of 18S) provide divergent taxonomic resolution, with 18S, the V4 and V9 regions being the most conserved. The ROD will be updated regularly to provide an increasing number of full-length rDNA operons to the scientific community.

摘要

当前的核糖体DNA(rDNA)参考序列数据库是针对较短的DNA标记量身定制的,例如16/18S标记的部分区域或内转录间隔区(ITS)。然而,由于长读长DNA测序技术的进步,rDNA操纵子的更长片段越来越多地用于环境测序研究,以提高系统发育分辨率。因此,对更长的rDNA参考序列的需求日益增长。在此,我们展示了核糖体操纵子数据库(ROD),它包含从公开可用的基因组组装中筛选出的真核生物全长rDNA操纵子。在来自NCBI的34701个被检查的真核生物基因组组装中,34.1%检测到了全长操纵子。在大多数情况下(53.1%),检测到不止一种操纵子变体,这可能是由于基因组内操纵子拷贝变异性、非单倍体基因组中的等位基因变异,或测序和组装过程中的技术错误。发现的最高拷贝数是玉米中的5947个。总共检测到453697个独特的操纵子,在99%序列同一性的基因组内聚类后,剩下69480个操纵子变体簇。操纵子长度在真核生物中差异很大,范围从4136到16463碱基对,这将在整个操纵子的扩增过程中导致相当大的聚合酶链反应(PCR)偏差。对全长操纵子进行聚类显示,不同部分(即18S、28S以及18S的高变区V4和V9)提供了不同的分类分辨率,其中18S、V4和V9区域最为保守。ROD将定期更新,以便为科学界提供越来越多的全长rDNA操纵子。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/244b/11646303/72896918e807/MEN-25-e14031-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验