• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基因间区域作为基于机器学习的来自美国的鼠伤寒沙门氏菌宿主归因的基因组特征的优势。

The advantage of intergenic regions as genomic features for machine-learning-based host attribution of Typhimurium from the USA.

作者信息

Chalka Antonia, Dallman Tim J, Vohra Prerna, Stevens Mark P, Gally David L

机构信息

The Roslin Institute and R(D)SVS, University of Edinburgh, Edinburgh, UK.

Institute for Risk Assessment Sciences (IRAS), University of Utrecht, Heidelberglaan, Utrecht, Netherlands.

出版信息

Microb Genom. 2023 Oct;9(10). doi: 10.1099/mgen.0.001116.

DOI:10.1099/mgen.0.001116
PMID:37843883
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10634445/
Abstract

is a taxonomically diverse pathogen with over 2600 serovars associated with a wide variety of animal hosts including humans, other mammals, birds and reptiles. Some serovars are host-specific or host-restricted and cause disease in distinct host species, while others, such as serovar . Typhimurium (STm), are generalists and have the potential to colonize a wide variety of species. However, even within generalist serovars such as STm it is becoming clear that pathovariants exist that differ in tropism and virulence. Identifying the genetic factors underlying host specificity is complex, but the availability of thousands of genome sequences and advances in machine learning have made it possible to build specific host prediction models to aid outbreak control and predict the human pathogenic potential of isolates from animals and other reservoirs. We have advanced this area by building host-association prediction models trained on a wide range of genomic features and compared them with predictions based on nearest-neighbour phylogeny. SNPs, protein variants (PVs), antimicrobial resistance (AMR) profiles and intergenic regions (IGRs) were extracted from 3883 high-quality STm assemblies collected from humans, swine, bovine and poultry in the USA, and used to construct Random Forest (RF) machine learning models. An additional 244 recent STm assemblies from farm animals were used as a test set for further validation. The models based on PVs and IGRs had the best performance in terms of predicting the host of origin of isolates and outperformed nearest-neighbour phylogenetic host prediction as well as models based on SNPs or AMR data. However, the models did not yield reliable predictions when tested with isolates that were phylogenetically distinct from the training set. The IGR and PV models were often able to differentiate human isolates in clusters where the majority of isolates were from a single animal source. Notably, IGRs were the feature with the best performance across multiple models which may be due to IGRs acting as both a representation of their flanking genes, equivalent to PVs, while also capturing genomic regulatory variation, such as altered promoter regions. The IGR and PV models predict that ~45 % of the human infections with STm in the USA originate from bovine, ~40 % from poultry and ~14.5 % from swine, although sequences of isolates from other sources were not used for training. In summary, the research demonstrates a significant gain in accuracy for models with IGRs and PVs as features compared to SNP-based and core genome phylogeny predictions when applied within the existing population structure. This article contains data hosted by Microreact.

摘要

是一种分类学上多样的病原体,有超过2600个血清型,与包括人类、其他哺乳动物、鸟类和爬行动物在内的多种动物宿主相关。一些血清型具有宿主特异性或宿主限制性,在不同的宿主物种中引起疾病,而其他血清型,如鼠伤寒血清型(STm),则具有通用性,有可能在多种物种中定殖。然而,即使在像STm这样的通用血清型中,也越来越清楚地表明存在着在嗜性和毒力方面不同的致病变种。确定宿主特异性背后的遗传因素很复杂,但数千个基因组序列的可用性和机器学习的进展使得构建特定的宿主预测模型成为可能,以帮助控制疫情并预测来自动物和其他宿主的分离株的人类致病潜力。我们通过构建基于广泛基因组特征训练的宿主关联预测模型推进了这一领域,并将其与基于最近邻系统发育的预测进行了比较。从美国人类、猪、牛和家禽中收集的3883个高质量STm组装体中提取单核苷酸多态性(SNPs)、蛋白质变体(PVs)、抗菌药物耐药性(AMR)谱和基因间区域(IGRs),并用于构建随机森林(RF)机器学习模型。另外244个来自农场动物的近期STm组装体用作测试集进行进一步验证。基于PVs和IGRs的模型在预测分离株的起源宿主方面表现最佳,优于最近邻系统发育宿主预测以及基于SNPs或AMR数据的模型。然而,当用与训练集系统发育不同的分离株进行测试时,这些模型没有产生可靠的预测。IGR和PV模型通常能够在大多数分离株来自单一动物来源的簇中区分人类分离株。值得注意的是,IGRs是多个模型中表现最佳的特征,这可能是因为IGRs既代表其侧翼基因,等同于PVs,同时也捕获基因组调控变异,如启动子区域的改变。IGR和PV模型预测,在美国,约45%的人类STm感染源自牛,约40%源自家禽,约14.5%源自猪,尽管来自其他来源的分离株序列未用于训练。总之,该研究表明,与基于SNP和核心基因组系统发育预测相比,以IGRs和PVs为特征的模型在应用于现有种群结构时准确性有显著提高。本文包含由Microreact托管的数据。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/41a2/10634445/e5c982f355b6/mgen-9-1116-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/41a2/10634445/eaf2b32421d2/mgen-9-1116-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/41a2/10634445/443f1582ab98/mgen-9-1116-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/41a2/10634445/8e592e5e424b/mgen-9-1116-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/41a2/10634445/f44c7fc5f42b/mgen-9-1116-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/41a2/10634445/d54e3fab1be3/mgen-9-1116-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/41a2/10634445/28cff938b527/mgen-9-1116-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/41a2/10634445/e5c982f355b6/mgen-9-1116-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/41a2/10634445/eaf2b32421d2/mgen-9-1116-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/41a2/10634445/443f1582ab98/mgen-9-1116-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/41a2/10634445/8e592e5e424b/mgen-9-1116-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/41a2/10634445/f44c7fc5f42b/mgen-9-1116-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/41a2/10634445/d54e3fab1be3/mgen-9-1116-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/41a2/10634445/28cff938b527/mgen-9-1116-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/41a2/10634445/e5c982f355b6/mgen-9-1116-g007.jpg

相似文献

1
The advantage of intergenic regions as genomic features for machine-learning-based host attribution of Typhimurium from the USA.基因间区域作为基于机器学习的来自美国的鼠伤寒沙门氏菌宿主归因的基因组特征的优势。
Microb Genom. 2023 Oct;9(10). doi: 10.1099/mgen.0.001116.
2
Genomic Analysis of Salmonella enterica Serovar Typhimurium from Wild Passerines in England and Wales.英格兰和威尔士野生雀形目鸟类中鼠伤寒沙门氏菌血清型鼠伤寒杆菌的基因组分析。
Appl Environ Microbiol. 2016 Oct 27;82(22):6728-6735. doi: 10.1128/AEM.01660-16. Print 2016 Nov 15.
3
Whole genome sequencing analysis of multiple Salmonella serovars provides insights into phylogenetic relatedness, antimicrobial resistance, and virulence markers across humans, food animals and agriculture environmental sources.对多个沙门氏菌血清型进行全基因组测序分析,深入了解人类、食品动物和农业环境来源中菌株的系统进化关系、抗药性和毒力标记。
BMC Genomics. 2018 Nov 6;19(1):801. doi: 10.1186/s12864-018-5137-4.
4
Comparative Genomic Analysis of Salmonella enterica Serovar Typhimurium Isolates from Passerines Reveals Two Lineages Circulating in Europe, New Zealand, and the United States.沙门氏菌 Typhimurium 血清型分离株的比较基因组分析揭示了在欧洲、新西兰和美国流行的两个谱系。
Appl Environ Microbiol. 2022 May 10;88(9):e0020522. doi: 10.1128/aem.00205-22. Epub 2022 Apr 18.
5
Patchy promiscuity: machine learning applied to predict the host specificity of and .斑驳混杂:机器学习在预测 和 的宿主特异性中的应用。
Microb Genom. 2017 Oct 3;3(10):e000135. doi: 10.1099/mgen.0.000135. eCollection 2017 Oct.
6
Defining the Core Genome of Salmonella enterica Serovar Typhimurium for Genomic Surveillance and Epidemiological Typing.定义肠炎沙门氏菌鼠伤寒血清型的核心基因组用于基因组监测和流行病学分型
J Clin Microbiol. 2015 Aug;53(8):2530-8. doi: 10.1128/JCM.03407-14. Epub 2015 May 27.
7
Salmonella enterica Serovar Typhimurium Isolates from Wild Birds in the United States Represent Distinct Lineages Defined by Bird Type.美国野生鸟类中分离出的沙门氏菌肠炎血清型鼠伤寒菌代表了不同的谱系,这些谱系由鸟类类型定义。
Appl Environ Microbiol. 2022 Mar 22;88(6):e0197921. doi: 10.1128/AEM.01979-21. Epub 2022 Feb 2.
8
A guide to machine learning for bacterial host attribution using genome sequence data.基于基因组序列数据的细菌宿主归因机器学习指南
Microb Genom. 2019 Dec;5(12). doi: 10.1099/mgen.0.000317.
9
Whole-Genome Sequencing of Drug-Resistant Salmonella enterica Isolates from Dairy Cattle and Humans in New York and Washington States Reveals Source and Geographic Associations.对纽约州和华盛顿州奶牛及人类中分离出的耐药性肠炎沙门氏菌进行全基因组测序揭示了来源及地域关联。
Appl Environ Microbiol. 2017 May 31;83(12). doi: 10.1128/AEM.00140-17. Print 2017 Jun 15.
10
Development and validation of a random forest algorithm for source attribution of animal and human Typhimurium and monophasic variants of Typhimurium isolates in England and Wales utilising whole genome sequencing data.利用全基因组测序数据开发并验证一种随机森林算法,用于英格兰和威尔士动物及人类鼠伤寒沙门氏菌以及鼠伤寒沙门氏菌单相变体分离株的溯源分析。
Front Microbiol. 2024 Mar 12;14:1254860. doi: 10.3389/fmicb.2023.1254860. eCollection 2023.

引用本文的文献

1
Whole-genome phenotype prediction with machine learning: open problems in bacterial genomics.利用机器学习进行全基因组表型预测:细菌基因组学中的开放性问题
Bioinformatics. 2025 Jul 1;41(7). doi: 10.1093/bioinformatics/btaf206.
2
Development of a logic regression-based approach for the discovery of host- and niche-informative biomarkers in and their application for microbial source tracking.基于逻辑回归的方法开发,用于发现 和 中的宿主和生态位信息生物标志物,并将其应用于微生物源追踪。
Appl Environ Microbiol. 2024 Jul 24;90(7):e0022724. doi: 10.1128/aem.00227-24. Epub 2024 Jun 28.
3
Predictive phage therapy for urinary tract infections: Cocktail selection for therapy based on machine learning models.

本文引用的文献

1
The European Union One Health 2019 Zoonoses Report.《欧盟2019年人畜共患病“同一健康”报告》
EFSA J. 2021 Feb 27;19(2):e06406. doi: 10.2903/j.efsa.2021.6406. eCollection 2021 Feb.
2
Producing polished prokaryotic pangenomes with the Panaroo pipeline.使用 Panaroo 管道生成精修的原核泛基因组。
Genome Biol. 2020 Jul 22;21(1):180. doi: 10.1186/s13059-020-02090-4.
3
Application of Whole-Genome Sequences and Machine Learning in Source Attribution of Salmonella Typhimurium.全基因组序列和机器学习在鼠伤寒沙门氏菌溯源中的应用。
基于机器学习模型的鸡尾酒疗法选择:用于尿路感染的预测性噬菌体治疗。
Proc Natl Acad Sci U S A. 2024 Mar 19;121(12):e2313574121. doi: 10.1073/pnas.2313574121. Epub 2024 Mar 13.
Risk Anal. 2020 Sep;40(9):1693-1705. doi: 10.1111/risa.13510. Epub 2020 Jun 8.
4
IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era.IQ-TREE 2:基因组时代系统发育推断的新模型和有效方法。
Mol Biol Evol. 2020 May 1;37(5):1530-1534. doi: 10.1093/molbev/msaa015.
5
A guide to machine learning for bacterial host attribution using genome sequence data.基于基因组序列数据的细菌宿主归因机器学习指南
Microb Genom. 2019 Dec;5(12). doi: 10.1099/mgen.0.000317.
6
An African Salmonella Typhimurium ST313 sublineage with extensive drug-resistance and signatures of host adaptation.具有广泛耐药性和宿主适应特征的非洲沙门氏菌 Typhimurium ST313 亚系。
Nat Commun. 2019 Sep 19;10(1):4280. doi: 10.1038/s41467-019-11844-z.
7
Validating the AMRFinder Tool and Resistance Gene Database by Using Antimicrobial Resistance Genotype-Phenotype Correlations in a Collection of Isolates.通过在分离株集合中使用抗生素耐药基因型-表型相关性来验证 AMRFinder 工具和耐药基因数据库。
Antimicrob Agents Chemother. 2019 Oct 22;63(11). doi: 10.1128/AAC.00483-19. Print 2019 Nov.
8
A review of with particular focus on the pathogenicity and virulence factors, host specificity and antimicrobial resistance including multidrug resistance.一项特别关注致病性和毒力因子、宿主特异性以及包括多重耐药性在内的抗菌药物耐药性的综述。 (你提供的原文“A review of with particular focus...”中“of”后面内容缺失)
Vet World. 2019;12(4):504-521. doi: 10.14202/vetworld.2019.504-521. Epub 2019 Apr 6.
9
Adding function to the genome of African Salmonella Typhimurium ST313 strain D23580.为非洲沙门氏菌 Typhimurium ST313 菌株 D23580 的基因组添加功能。
PLoS Biol. 2019 Jan 15;17(1):e3000059. doi: 10.1371/journal.pbio.3000059. eCollection 2019 Jan.
10
Retrospective application of transposon-directed insertion-site sequencing to investigate niche-specific virulence of Salmonella Typhimurium in cattle.经转座子定向插入位点测序的回顾性应用,调查鼠伤寒沙门氏菌在牛中的特定生态位毒力。
BMC Genomics. 2019 Jan 8;20(1):20. doi: 10.1186/s12864-018-5319-0.