Suppr超能文献

一种用于比较细菌基因组学的参考泛基因组方法:在致病性弯曲杆菌中鉴定新型流行病学标志物。

A reference pan-genome approach to comparative bacterial genomics: identification of novel epidemiological markers in pathogenic Campylobacter.

作者信息

Méric Guillaume, Yahara Koji, Mageiros Leonardos, Pascoe Ben, Maiden Martin C J, Jolley Keith A, Sheppard Samuel K

机构信息

Institute of Life Science, College of Medicine, Swansea University, Swansea, United Kingdom.

Institute of Life Science, College of Medicine, Swansea University, Swansea, United Kingdom; Institute of Medical Science, University of Tokyo, Minato-ku, Tokyo, Japan.

出版信息

PLoS One. 2014 Mar 27;9(3):e92798. doi: 10.1371/journal.pone.0092798. eCollection 2014.

Abstract

The increasing availability of hundreds of whole bacterial genomes provides opportunities for enhanced understanding of the genes and alleles responsible for clinically important phenotypes and how they evolved. However, it is a significant challenge to develop easy-to-use and scalable methods for characterizing these large and complex data and relating it to disease epidemiology. Existing approaches typically focus on either homologous sequence variation in genes that are shared by all isolates, or non-homologous sequence variation--focusing on genes that are differentially present in the population. Here we present a comparative genomics approach that simultaneously approximates core and accessory genome variation in pathogen populations and apply it to pathogenic species in the genus Campylobacter. A total of 7 published Campylobacter jejuni and Campylobacter coli genomes were selected to represent diversity across these species, and a list of all loci that were present at least once was compiled. After filtering duplicates a 7-isolate reference pan-genome, of 3,933 loci, was defined. A core genome of 1,035 genes was ubiquitous in the sample accounting for 59% of the genes in each isolate (average genome size of 1.68 Mb). The accessory genome contained 2,792 genes. A Campylobacter population sample of 192 genomes was screened for the presence of reference pan-genome loci with gene presence defined as a BLAST match of ≥ 70% identity over ≥ 50% of the locus length--aligned using MUSCLE on a gene-by-gene basis. A total of 21 genes were present only in C. coli and 27 only in C. jejuni, providing information about functional differences associated with species and novel epidemiological markers for population genomic analyses. Homologs of these genes were found in several of the genomes used to define the pan-genome and, therefore, would not have been identified using a single reference strain approach.

摘要

数百种完整细菌基因组的可得性日益增加,这为深入了解负责临床重要表型的基因和等位基因以及它们的进化方式提供了机会。然而,开发易于使用且可扩展的方法来表征这些庞大而复杂的数据并将其与疾病流行病学联系起来,是一项重大挑战。现有方法通常要么侧重于所有分离株共有的基因中的同源序列变异,要么侧重于非同源序列变异——关注群体中差异存在的基因。在此,我们提出一种比较基因组学方法,该方法能同时估算病原体群体中核心基因组和辅助基因组的变异,并将其应用于弯曲杆菌属的致病物种。总共选择了7个已发表的空肠弯曲菌和结肠弯曲菌基因组来代表这些物种的多样性,并编制了所有至少出现一次的基因座列表。在过滤重复项后,定义了一个由3933个基因座组成的7分离株参考泛基因组。1035个基因的核心基因组在样本中普遍存在,占每个分离株中基因的59%(平均基因组大小为1.68 Mb)。辅助基因组包含2792个基因。对192个基因组的弯曲杆菌群体样本进行筛选,以确定参考泛基因组基因座的存在,基因存在定义为在≥50%的基因座长度上具有≥70%的同一性的BLAST匹配——在逐个基因的基础上使用MUSCLE进行比对。共有21个基因仅存在于结肠弯曲菌中,27个仅存在于空肠弯曲菌中,这为与物种相关的功能差异以及群体基因组分析的新型流行病学标志物提供了信息。这些基因的同源物在用于定义泛基因组的几个基因组中被发现,因此,使用单一参考菌株方法无法鉴定它们。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4048/3968026/351955c04334/pone.0092798.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验