Suppr超能文献

通过对 Illumina MiSeq 数据进行测序后处理,从细菌含量低的人体样本中获取准确的微生物组图谱。

Deriving accurate microbiota profiles from human samples with low bacterial content through post-sequencing processing of Illumina MiSeq data.

机构信息

Menzies School of Health Research, Child Health Division, Charles Darwin University, Darwin, NT Australia ; School of Medicine, Flinders University, Bedford Park, Adelaide, SA Australia ; Infection and Immunity Theme, South Australia Health and Medical Research Institute, North Terrace, Adelaide, SA Australia.

Infection and Immunity Theme, South Australia Health and Medical Research Institute, North Terrace, Adelaide, SA Australia.

出版信息

Microbiome. 2015 May 5;3:19. doi: 10.1186/s40168-015-0083-8. eCollection 2015.

Abstract

BACKGROUND

The rapid expansion of 16S rRNA gene sequencing in challenging clinical contexts has resulted in a growing body of literature of variable quality. To a large extent, this is due to a failure to address spurious signal that is characteristic of samples with low levels of bacteria and high levels of non-bacterial DNA. We have developed a workflow based on the paired-end read Illumina MiSeq-based approach, which enables significant improvement in data quality, post-sequencing. We demonstrate the efficacy of this methodology through its application to paediatric upper-respiratory samples from several anatomical sites.

RESULTS

A workflow for processing sequence data was developed based on commonly available tools. Data generated from different sample types showed a marked variation in levels of non-bacterial signal and 'contaminant' bacterial reads. Significant differences in the ability of reference databases to accurately assign identity to operational taxonomic units (OTU) were observed. Three OTU-picking strategies were trialled as follows: de novo, open-reference and closed-reference, with open-reference performing substantially better. Relative abundance of OTUs identified as potential reagent contamination showed a strong inverse correlation with amplicon concentration allowing their objective removal. The removal of the spurious signal showed the greatest improvement in sample types typically containing low levels of bacteria and high levels of human DNA. A substantial impact of pre-filtering data and spurious signal removal was demonstrated by principal coordinate and co-occurrence analysis. For example, analysis of taxon co-occurrence in adenoid swab and middle ear fluid samples indicated that failure to remove the spurious signal resulted in the inclusion of six out of eleven bacterial genera that accounted for 80% of similarity between the sample types.

CONCLUSIONS

The application of the presented workflow to a set of challenging clinical samples demonstrates its utility in removing the spurious signal from the dataset, allowing clinical insight to be derived from what would otherwise be highly misleading output. While other approaches could potentially achieve similar improvements, the methodology employed here represents an accessible means to exclude the signal from contamination and other artefacts.

摘要

背景

16S rRNA 基因测序在具有挑战性的临床环境中的快速扩展导致了大量质量参差不齐的文献。在很大程度上,这是由于未能解决低细菌水平和高非细菌 DNA 水平样本特有的虚假信号。我们开发了一种基于 Illumina MiSeq 双端读取的工作流程,该流程可显著提高测序后数据的质量。我们通过将其应用于来自多个解剖部位的儿科上呼吸道样本证明了该方法的功效。

结果

我们开发了一种基于常用工具的处理序列数据的工作流程。来自不同样本类型的数据显示出非细菌信号和“污染物”细菌读数水平的明显变化。观察到参考数据库准确识别操作分类单元(OTU)的能力存在显著差异。我们尝试了三种 OTU 选择策略,即从头开始、开放参考和封闭参考,其中开放参考效果要好得多。鉴定为潜在试剂污染的 OTU 的相对丰度与扩增子浓度呈强烈的负相关,允许其客观去除。去除虚假信号可最大程度地改善通常含有低细菌水平和高人类 DNA 水平的样本类型。主坐标和共现分析表明,对数据进行预过滤和去除虚假信号具有实质性影响。例如,对腺样体拭子和中耳液样本中分类群共现的分析表明,如果不去除虚假信号,就会包含 11 个细菌属中的 6 个,这占了样本类型之间 80%的相似性。

结论

将提出的工作流程应用于一组具有挑战性的临床样本,证明了它在从数据集中去除虚假信号方面的实用性,从而可以从原本高度误导性的输出中得出临床见解。虽然其他方法可能会带来类似的改进,但这里采用的方法代表了一种可访问的手段,可以排除来自污染和其他伪影的信号。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验