Parasites and Microbes, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1RQ, UK.
EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire, UK.
Microb Genom. 2021 Feb;7(2). doi: 10.1099/mgen.0.000499.
is a highly diverse organism that includes a range of commensal and pathogenic variants found across a range of niches and worldwide. In addition to causing severe intestinal and extraintestinal disease, is considered a priority pathogen due to high levels of observed drug resistance. The diversity in the population is driven by high genome plasticity and a very large gene pool. All these have made one of the most well-studied organisms, as well as a commonly used laboratory strain. Today, there are thousands of sequenced genomes stored in public databases. While data is widely available, accessing the information in order to perform analyses can still be a challenge. Collecting relevant available data requires accessing different sources, where data may be stored in a range of formats, and often requires further manipulation and processing to apply various analyses and extract useful information. In this study, we collated and intensely curated a collection of over 10 000 and genomes to provide a single, uniform, high-quality dataset. were included as they are considered specialized pathovars of . We provide these data in a number of easily accessible formats that can be used as the foundation for future studies addressing the biological differences between lineages and the distribution and flow of genes in the population at a high resolution. The analysis we present emphasizes our lack of understanding of the true diversity of the species, and the biased nature of our current understanding of the genetic diversity of such a key pathogen.
是一种高度多样化的生物体,包括一系列在各种生态位和全球范围内发现的共生和致病变体。除了引起严重的肠道和肠道外疾病外,由于观察到的高水平耐药性,被认为是优先病原体。种群的多样性是由高基因组可塑性和非常大的基因库驱动的。所有这些都使成为研究最深入的生物体之一,也是常用的实验室菌株。今天,有数千个已测序的基因组存储在公共数据库中。虽然数据广泛可用,但为了进行分析而访问这些信息仍然是一个挑战。收集相关的可用数据需要访问不同的来源,其中数据可能以多种格式存储,并且通常需要进一步的操作和处理才能应用各种分析并提取有用的信息。在这项研究中,我们收集并精心整理了超过 10000 个和基因组的集合,以提供一个单一、统一、高质量的数据集。被包括在内,因为它们被认为是 的专门病理变种。我们以多种易于访问的格式提供这些数据,可作为未来研究的基础,这些研究旨在解决谱系之间的生物学差异以及在高分辨率下种群中基因的分布和流动。我们提出的分析强调了我们对真实多样性的理解不足,以及我们目前对如此关键病原体的遗传多样性的理解的偏颇性质。