Suppr超能文献

伍伦贡大学临床癌症中心10K基因组计划的基因组多样性

Genomic diversity of The UoWUCC 10K genomes project.

作者信息

Achtman Mark, Zhou Zhemin, Alikhan Nabil-Fareed, Tyne William, Parkhill Julian, Cormican Martin, Chiou Chien-Shun, Torpdahl Mia, Litrup Eva, Prendergast Deirdre M, Moore John E, Strain Sam, Kornschober Christian, Meinersmann Richard, Uesbeck Alexandra, Weill François-Xavier, Coffey Aidan, Andrews-Polymenis Helene, Curtiss Rd Roy, Fanning Séamus

机构信息

Warwick Medical School, University of Warwick, Coventry, CV4 7AL, UK.

Department of Veterinary Medicine, University of Cambridge, Cambridge, CB3 0ES, UK.

出版信息

Wellcome Open Res. 2021 Feb 1;5:223. doi: 10.12688/wellcomeopenres.16291.2. eCollection 2020.

Abstract

Most publicly available genomes of are from human disease in the US and the UK, or from domesticated animals in the US. Here we describe a historical collection of 10,000 strains isolated between 1891-2010 in 73 different countries. They encompass a broad range of sources, ranging from rivers through reptiles to the diversity of all isolated on the island of Ireland between 2000 and 2005. Genomic DNA was isolated, and sequenced by Illumina short read sequencing. The short reads are publicly available in the Short Reads Archive. They were also uploaded to EnteroBase, which assembled and annotated draft genomes. 9769 draft genomes which passed quality control were genotyped with multiple levels of multilocus sequence typing, and used to predict serovars. Genomes were assigned to hierarchical clusters on the basis of numbers of pair-wise allelic differences in core genes, which were mapped to genetic Lineages within phylogenetic trees. The University of Warwick/University College Cork (UoWUCC) project greatly extends the geographic sources, dates and core genomic diversity of publicly available genomes. We illustrate these features by an overview of core genomic Lineages within 33,000 publicly available genomes whose strains were isolated before 2011. We also present detailed examinations of HC400, HC900 and HC2000 hierarchical clusters within exemplar Lineages, including serovars Typhimurium, Enteritidis and Mbandaka. These analyses confirm the polyphyletic nature of multiple serovars while showing that discrete clusters with geographical specificity can be reliably recognized by hierarchical clustering approaches. The results also demonstrate that the genomes sequenced here provide an important counterbalance to the sampling bias which is so dominant in current genomic sequencing.

摘要

大多数公开可用的基因组来自美国和英国的人类疾病,或来自美国的家养动物。在这里,我们描述了一个历史样本集,包含1891年至2010年间在73个不同国家分离出的10000个菌株。它们涵盖了广泛的来源,从河流到爬行动物,再到2000年至2005年间在爱尔兰岛分离出的所有菌株的多样性。分离出基因组DNA,并通过Illumina短读测序进行测序。这些短读序列在短读存档库中公开可用。它们也被上传到EnteroBase,后者对基因组草图进行组装和注释。9769个通过质量控制的基因组草图通过多个层次的多位点序列分型进行基因分型,并用于预测血清型。根据核心基因中两两等位基因差异的数量,将基因组分配到层次聚类中,这些差异被映射到系统发育树中的遗传谱系。华威大学/科克大学学院(UoWUCC)项目极大地扩展了公开可用的基因组的地理来源、时间和核心基因组多样性。我们通过概述2011年之前分离出菌株的33000个公开可用基因组中的核心基因组谱系来说明这些特征。我们还对典型谱系中的HC400、HC900和HC2000层次聚类进行了详细研究,包括鼠伤寒血清型、肠炎血清型和班达卡血清型。这些分析证实了多个血清型的多系性质,同时表明通过层次聚类方法可以可靠地识别具有地理特异性的离散聚类。结果还表明,这里测序的基因组对当前基因组测序中占主导地位的采样偏差提供了重要的平衡。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c2bc/7869176/246b3cee4df0/wellcomeopenres-5-18280-g0000.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验