Molecular Microbiology and Genomics Consultants, Tannenstrasse 7, 55576, Zotzenheim, Germany.
Arkansas Center for Genomic Epidemiology and Medicine, Department of Biomedical Informatics, University of Arkansas for Medical Sciences, 4301 W. Markham Str., Slot 782, Little Rock, AR, 72205, USA.
Microb Ecol. 2018 Oct;76(3):801-813. doi: 10.1007/s00248-018-1155-7. Epub 2018 Feb 14.
Infections due to Clostridioides difficile (previously known as Clostridium difficile) are a major problem in hospitals, where cases can be caused by community-acquired strains as well as by nosocomial spread. Whole genome sequences from clinical samples contain a lot of information but that needs to be analyzed and compared in such a way that the outcome is useful for clinicians or epidemiologists. Here, we compare 663 public available complete genome sequences of C. difficile using average amino acid identity (AAI) scores. This analysis revealed that most of these genomes (640, 96.5%) clearly belong to the same species, while the remaining 23 genomes produce four distinct clusters within the Clostridioides genus. The main C. difficile cluster can be further divided into sub-clusters, depending on the chosen cutoff. We demonstrate that MLST, either based on partial or full gene-length, results in biased estimates of genetic differences and does not capture the true degree of similarity or differences of complete genomes. Presence of genes coding for C. difficile toxins A and B (ToxA/B), as well as the binary C. difficile toxin (CDT), was deduced from their unique PfamA domain architectures. Out of the 663 C. difficile genomes, 535 (80.7%) contained at least one copy of ToxA or ToxB, while these genes were missing from 128 genomes. Although some clusters were enriched for toxin presence, these genes are variably present in a given genetic background. The CDT genes were found in 191 genomes, which were restricted to a few clusters only, and only one cluster lacked the toxin A/B genes consistently. A total of 310 genomes contained ToxA/B without CDT (47%). Further, published metagenomic data from stools were used to assess the presence of C. difficile sequences in blinded cases of C. difficile infection (CDI) and controls, to test if metagenomic analysis is sensitive enough to detect the pathogen, and to establish strain relationships between cases from the same hospital. We conclude that metagenomics can contribute to the identification of CDI and can assist in characterization of the most probable causative strain in CDI patients.
艰难梭菌(以前称为艰难梭菌)引起的感染是医院的一个主要问题,其中病例可能由社区获得性菌株以及医院内传播引起。临床样本的全基因组序列包含大量信息,但需要以对临床医生或流行病学家有用的方式进行分析和比较。在这里,我们使用平均氨基酸同一性(AAI)分数比较了 663 个公共可用的完整艰难梭菌基因组序列。该分析表明,这些基因组中的大多数(640 个,96.5%)显然属于同一物种,而其余 23 个基因组在梭菌属内产生了四个不同的聚类。主要的艰难梭菌聚类可以进一步分为亚聚类,具体取决于所选的截止值。我们证明,基于部分或全长基因的 MLST 会导致遗传差异的估计产生偏差,并且无法捕获完整基因组的真实相似程度或差异。从其独特的 PfamA 结构域架构推断出编码艰难梭菌毒素 A 和 B(ToxA/B)以及二元艰难梭菌毒素(CDT)的基因的存在。在 663 个艰难梭菌基因组中,535 个(80.7%)至少含有一个 ToxA 或 ToxB 基因,而 128 个基因组中没有这些基因。尽管一些聚类中存在毒素存在的富集,但这些基因在给定的遗传背景中存在差异。在 191 个基因组中发现了 CDT 基因,这些基因仅局限于少数几个聚类中,只有一个聚类始终缺乏毒素 A/B 基因。共有 310 个基因组包含没有 CDT 的 ToxA/B(47%)。此外,还使用来自粪便的已发表宏基因组数据来评估盲法艰难梭菌感染(CDI)和对照病例中艰难梭菌序列的存在,以测试宏基因组分析是否足够敏感以检测病原体,并建立来自同一医院的病例之间的菌株关系。我们得出结论,宏基因组学可以有助于 CDI 的鉴定,并有助于对 CDI 患者中最可能的致病菌株进行特征描述。