Sonett Dylan, Brown Tanya, Bengtsson-Palme Johan, Padilla-Gamiño Jacqueline L, Zaneveld Jesse R
Department of Pharmacy, School of Pharmacy, University of Washington, Seattle, WA, United States.
University of Washington, Division of Biological Sciences, School of Science, Technology, Engineering, and Mathematics, Bothell, WA, United States.
ISME Commun. 2024 Sep 24;4(1):ycae114. doi: 10.1093/ismeco/ycae114. eCollection 2024 Jan.
The genomes of mitochondria and chloroplasts contain ribosomal RNA (rRNA) genes, reflecting their ancestry as free-living bacteria. These organellar rRNAs are often amplified in microbiome studies of animals and plants. If identified, they can be discarded, merely reducing sequencing depth. However, we identify certain high-abundance organeller RNAs not identified by common pipelines, which may compromise statistical analysis of microbiome structure and diversity. We quantified this by reanalyzing 7459 samples from seven 16S rRNA studies, including microbiomes from 927 unique animal genera. We find that under-annotation of cryptic mitochondrial and chloroplast reads affects multiple of these large-scale cross-species microbiome comparisons, and varies between host species, biasing comparisons. We offer a straightforward solution: supplementing existing taxonomies with diverse organelle rRNA sequences. This resolves up to 97% of unique unclassified sequences in some entire studies as mitochondrial (14% averaged across all studies), without increasing false positive annotations in mitochondria-free mock communities. Improved annotation decreases the proportion of unknown sequences by ≥10-fold in 2262 of 7459 samples (30%), spanning five of seven major studies examined. We recommend leveraging organelle sequence diversity to better identify organelle gene sequences in microbiome studies, and provide code, data resources and tutorials that implement this approach.
线粒体和叶绿体的基因组包含核糖体RNA(rRNA)基因,这反映了它们作为自由生活细菌的祖先。这些细胞器rRNA在动植物的微生物组研究中经常被扩增。如果被识别出来,它们可以被丢弃,这只会降低测序深度。然而,我们识别出了一些常见流程未识别的高丰度细胞器RNA,这可能会影响微生物组结构和多样性的统计分析。我们通过重新分析来自7项16S rRNA研究的7459个样本对此进行了量化,这些样本包括来自927个独特动物属的微生物组。我们发现,隐秘线粒体和叶绿体读数的注释不足会影响这些大规模跨物种微生物组比较中的多个比较,并且在宿主物种之间存在差异,从而使比较产生偏差。我们提供了一个简单的解决方案:用多样的细胞器rRNA序列补充现有分类法。这在一些完整研究中最多可将97%的独特未分类序列解析为线粒体序列(所有研究平均为14%),而不会增加无线粒体模拟群落中的假阳性注释。改进注释后,在7459个样本中的2262个样本(30%)中,未知序列的比例降低了≥10倍,涵盖了所研究的7项主要研究中的5项。我们建议利用细胞器序列多样性在微生物组研究中更好地识别细胞器基因序列,并提供实现此方法的代码、数据资源和教程。