Castro Juan C, Rodriguez-R Luis M, Harvey William T, Weigand Michael R, Hatt Janet K, Carter Michelle Q, Konstantinidis Konstantinos T
Center for Bioinformatics and Computational Genomics, Georgia Institute of Technology, Atlanta, GA, United States of America.
School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, United States of America.
PeerJ. 2018 Nov 2;6:e5882. doi: 10.7717/peerj.5882. eCollection 2018.
Accurate detection of target microbial species in metagenomic datasets from environmental samples remains limited because the limit of detection of current methods is typically inaccessible and the frequency of false-positives, resulting from inadequate identification of regions of the genome that are either too highly conserved to be diagnostic (e.g., rRNA genes) or prone to frequent horizontal genetic exchange (e.g., mobile elements) remains unknown. To overcome these limitations, we introduce imGLAD, which aims to detect (target) genomic sequences in metagenomic datasets. imGLAD achieves high accuracy because it uses the sequence-discrete population concept for discriminating between metagenomic reads originating from the target organism compared to reads from co-occurring close relatives, masks regions of the genome that are not informative using the MyTaxa engine, and models both the sequencing breadth and depth to determine relative abundance and limit of detection. We validated imGLAD by analyzing metagenomic datasets derived from spinach leaves inoculated with the enteric pathogen O157:H7 and showed that its limit of detection can be comparable to that of PCR-based approaches for these samples (∼1 cell/gram).
由于当前方法的检测限通常难以达到,并且由基因组中高度保守以至于无法用于诊断的区域(例如rRNA基因)或易于频繁发生水平基因交换的区域(例如移动元件)鉴定不足导致的假阳性频率仍然未知,因此在环境样本的宏基因组数据集中准确检测目标微生物物种仍然受到限制。为了克服这些限制,我们引入了imGLAD,其旨在检测宏基因组数据集中的(目标)基因组序列。imGLAD实现了高精度,因为它使用序列离散群体概念来区分源自目标生物体的宏基因组读数与来自同时存在的近亲的读数,使用MyTaxa引擎掩盖无信息的基因组区域,并对测序广度和深度进行建模以确定相对丰度和检测限。我们通过分析源自接种肠道病原体O157:H7的菠菜叶的宏基因组数据集来验证imGLAD,并表明其检测限与这些样本基于PCR的方法相当(约1个细胞/克)。