基于基因组信息学和机器学习的方法鉴定全球流行谱系（包括高风险克隆群）中大肠杆菌基因组中的抗药性编码特征和毒力属性。

Genome Informatics and Machine Learning-Based Identification of Antimicrobial Resistance-Encoding Features and Virulence Attributes in Escherichia coli Genomes Representing Globally Prevalent Lineages, Including High-Risk Clonal Complexes.

机构信息

Pathogen Biology Laboratory, Department of Biotechnology and Bioinformatics, University of Hyderabadgrid.18048.35, Hyderabad, India.

出版信息

mBio. 2021 Feb 22;13(1):e0379621. doi: 10.1128/mbio.03796-21. Epub 2022 Feb 15.

DOI:10.1128/mbio.03796-21

PMID:35164570

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8844930/

Abstract

Escherichia coli, a ubiquitous commensal/pathogenic member from the family, accounts for high infection burden, morbidity, and mortality throughout the world. With emerging multidrug resistance (MDR) on a massive scale, E. coli has been listed as one of the Global Antimicrobial Resistance and Use Surveillance System (GLASS) priority pathogens. Understanding the resistance mechanisms and underlying genomic features appears to be of utmost importance to tackle further spread of these multidrug-resistant superbugs. While a few of the globally prevalent sequence types (STs) of E. coli, such as ST131, ST69, ST405, and ST648, have been previously reported to be highly virulent and harboring MDR, there is no clarity if certain ST lineages have a greater propensity to acquire MDR. In this study, large-scale comparative genomics of a total of 5,653 E. coli genomes from 19 ST lineages revealed ST-wide prevalence patterns of genomic features, such as antimicrobial resistance (AMR)-encoding genes/mutations, virulence genes, integrons, and transposons. Interpretation of the importance of these features using a Random Forest Classifier trained with 11,988 genomic features from whole-genome sequence data identified ST-specific or phylogroup-specific signature proteins mostly belonging to different protein superfamilies, including the toxin-antitoxin systems. Our study provides a comprehensive understanding of a myriad of genomic features, ST-specific proteins, and resistance mechanisms entailing different lineages of E. coli at the level of genomes; this could be of significant downstream importance in understanding the mechanisms of AMR, in clinical discovery, in epidemiology, and in devising control strategies. With the leap in whole-genome data being generated, the application of relevant methods to mine biologically significant information from microbial genomes is of utmost importance to public health genomics. Machine-learning methods have been used not only to mine, curate, or classify the data but also to identify the relevant features that could be linked to a particular class/target. This is perhaps one of the pioneering studies that has attempted to classify a large repertoire of E. coli genome data sets (5,653 genomes) belonging to 19 different STs (including well-studied as well as understudied STs) using machine learning approaches. Important features identified by these approaches have revealed ST-specific signature proteins, which could be further studied to predict possible associations with the phenotypic profiles, thereby providing a better understanding of virulence and the resistance mechanisms among different clonal lineages of E. coli.

摘要

大肠杆菌是家族中的一种普遍存在的共生/致病成员，在全球范围内造成了很高的感染负担、发病率和死亡率。随着大规模出现的多药耐药性（MDR），大肠杆菌已被列为全球抗菌药物耐药性和使用监测系统（GLASS）优先病原体之一。了解耐药机制和潜在的基因组特征对于遏制这些多药耐药超级细菌的进一步传播似乎至关重要。虽然一些全球流行的大肠杆菌序列类型（ST），如 ST131、ST69、ST405 和 ST648，以前被报道具有高度毒性和携带 MDR，但目前尚不清楚某些 ST 谱系是否更容易获得 MDR。在这项研究中，对来自 19 个 ST 谱系的总共 5653 个大肠杆菌基因组进行了大规模比较基因组学研究，揭示了基因组特征（如抗微生物药物耐药性（AMR）编码基因/突变、毒力基因、整合子和转座子）的 ST 广泛流行模式。使用随机森林分类器对来自全基因组序列数据的 11988 个基因组特征进行训练，对这些特征的重要性进行解释，确定了属于不同蛋白质超家族的 ST 特异性或进化枝特异性特征蛋白，包括毒素-抗毒素系统。我们的研究提供了对大肠杆菌不同谱系的基因组水平上的大量基因组特征、ST 特异性蛋白和耐药机制的全面了解；这对于理解 AMR 机制、临床发现、流行病学和制定控制策略可能具有重要的下游意义。随着全基因组数据的飞跃式增长，应用相关方法从微生物基因组中挖掘具有生物学意义的信息对于公共卫生基因组学至关重要。机器学习方法不仅用于挖掘、整理或分类数据，还用于识别可能与特定类别/目标相关的相关特征。这也许是一项开创性的研究之一，它试图使用机器学习方法对属于 19 个不同 ST（包括已研究和未研究的 ST）的大量大肠杆菌基因组数据集（5653 个基因组）进行分类。这些方法确定的重要特征揭示了 ST 特异性特征蛋白，这些蛋白可以进一步研究，以预测与表型谱的可能关联，从而更好地理解不同克隆谱系的大肠杆菌的毒力和耐药机制。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faef/8844930/c7e077e3701c/mbio.03796-21-f001.jpg

相似文献

Genome Informatics and Machine Learning-Based Identification of Antimicrobial Resistance-Encoding Features and Virulence Attributes in Escherichia coli Genomes Representing Globally Prevalent Lineages, Including High-Risk Clonal Complexes.

mBio. 2021 Feb 22;13(1):e0379621. doi: 10.1128/mbio.03796-21. Epub 2022 Feb 15.

Comparative Genomic Analysis of Globally Dominant ST131 Clone with Other Epidemiologically Successful Extraintestinal Pathogenic (ExPEC) Lineages.

mBio. 2017 Oct 24;8(5):e01596-17. doi: 10.1128/mBio.01596-17.

Whole-genome sequences of multidrug-resistant Escherichia coli in South-Kivu Province, Democratic Republic of Congo: characterization of phylogenomic changes, virulence and resistance genes.

BMC Infect Dis. 2019 Feb 11;19(1):137. doi: 10.1186/s12879-019-3763-3.

Genomic insights into virulence, antimicrobial resistance, and adaptation acumen of isolated from an urban environment.

mBio. 2024 Mar 13;15(3):e0354523. doi: 10.1128/mbio.03545-23. Epub 2024 Feb 20.

Evolutionary Dynamics Based on Comparative Genomics of Pathogenic Escherichia coli Lineages Harboring Polyketide Synthase () Island.

mBio. 2021 Mar 2;12(1):e03634-20. doi: 10.1128/mBio.03634-20.

National survey of Escherichia coli causing extraintestinal infections reveals the spread of drug-resistant clonal groups O25b:H4-B2-ST131, O15:H1-D-ST393 and CGA-D-ST69 with high virulence gene content in Spain.

J Antimicrob Chemother. 2011 Sep;66(9):2011-21. doi: 10.1093/jac/dkr235. Epub 2011 Jun 13.

The Population Genomics of Increased Virulence and Antibiotic Resistance in Human Commensal Escherichia coli over 30 Years in France.

Appl Environ Microbiol. 2022 Aug 9;88(15):e0066422. doi: 10.1128/aem.00664-22. Epub 2022 Jul 18.

First Indian report on genome-wide comparison of multidrug-resistant Escherichia coli from blood stream infections.

PLoS One. 2020 Feb 26;15(2):e0220428. doi: 10.1371/journal.pone.0220428. eCollection 2020.

Genomic and Functional Analysis of Emerging Virulent and Multidrug-Resistant Lineage Sequence Type 648.

Antimicrob Agents Chemother. 2019 May 24;63(6). doi: 10.1128/AAC.00243-19. Print 2019 Jun.

Arrangements of Mobile Genetic Elements among Virotype E Subpopulation of Sequence Type 131 Strains with High Antimicrobial Resistance and Virulence Gene Content.

mSphere. 2021 Aug 25;6(4):e0055021. doi: 10.1128/mSphere.00550-21.

引用本文的文献

The role of artificial intelligence and machine learning in predicting and combating antimicrobial resistance.

Comput Struct Biotechnol J. 2025 Jan 18;27:423-439. doi: 10.1016/j.csbj.2025.01.006. eCollection 2025.

Predicting Treatment Outcomes in Patients with Low Back Pain Using Gene Signature-Based Machine Learning Models.

Pain Ther. 2025 Feb;14(1):359-373. doi: 10.1007/s40122-024-00700-8. Epub 2024 Dec 25.

From Data to Decisions: Leveraging Artificial Intelligence and Machine Learning in Combating Antimicrobial Resistance - a Comprehensive Review.

J Med Syst. 2024 Aug 1;48(1):71. doi: 10.1007/s10916-024-02089-5.

Multi-omics strategy reveals potential role of antimicrobial resistance and virulence factor genes responsible for Simmental diarrheic calves caused by .

mSystems. 2024 Jun 18;9(6):e0134823. doi: 10.1128/msystems.01348-23. Epub 2024 May 14.

Unraveling the evolutionary dynamics of toxin-antitoxin systems in diverse genetic lineages of including the high-risk clonal complexes.

mBio. 2024 Jan 16;15(1):e0302323. doi: 10.1128/mbio.03023-23. Epub 2023 Dec 20.

Combination of whole genome sequencing and supervised machine learning provides unambiguous identification of -positive Shiga toxin-producing .

Front Microbiol. 2023 May 12;14:1118158. doi: 10.3389/fmicb.2023.1118158. eCollection 2023.

Panomics to decode virulence and fitness in Gram-negative bacteria.

Front Cell Infect Microbiol. 2022 Nov 21;12:1061596. doi: 10.3389/fcimb.2022.1061596. eCollection 2022.

本文引用的文献

ECTyper: serotype and species prediction from raw and assembled whole-genome sequence data.

Microb Genom. 2021 Dec;7(12). doi: 10.1099/mgen.0.000728.

BacAnt: A Combination Annotation Server for Bacterial DNA Sequences to Identify Antibiotic Resistance Genes, Integrons, and Transposable Elements.

Front Microbiol. 2021 Jul 23;12:649969. doi: 10.3389/fmicb.2021.649969. eCollection 2021.

Phylogroup stability contrasts with high within sequence type complex dynamics of Escherichia coli bloodstream infection isolates over a 12-year period.

Genome Med. 2021 May 5;13(1):77. doi: 10.1186/s13073-021-00892-0.

Sensitive protein alignments at tree-of-life scale using DIAMOND.

Nat Methods. 2021 Apr;18(4):366-368. doi: 10.1038/s41592-021-01101-x. Epub 2021 Apr 7.

Easy phylotyping of the EzClermont web app and command-line tool.

Access Microbiol. 2020 Jun 19;2(9):acmi000143. doi: 10.1099/acmi.0.000143. eCollection 2020.

The Evolutionary Conservation of Escherichia coli Drug Efflux Pumps Supports Physiological Functions.

J Bacteriol. 2020 Oct 22;202(22). doi: 10.1128/JB.00367-20.

The population genetics of pathogenic Escherichia coli.

Nat Rev Microbiol. 2021 Jan;19(1):37-54. doi: 10.1038/s41579-020-0416-x. Epub 2020 Aug 21.

Type II Toxin-Antitoxin Systems: Evolution and Revolutions.

J Bacteriol. 2020 Mar 11;202(7). doi: 10.1128/JB.00763-19.

CDD/SPARCLE: the conserved domain database in 2020.

Nucleic Acids Res. 2020 Jan 8;48(D1):D265-D268. doi: 10.1093/nar/gkz991.

Origin and Evolution of Hybrid Shiga Toxin-Producing and Uropathogenic Escherichia coli Strains of Sequence Type 141.

J Clin Microbiol. 2019 Dec 23;58(1). doi: 10.1128/JCM.01309-19.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于基因组信息学和机器学习的方法鉴定全球流行谱系（包括高风险克隆群）中大肠杆菌基因组中的抗药性编码特征和毒力属性。

Genome Informatics and Machine Learning-Based Identification of Antimicrobial Resistance-Encoding Features and Virulence Attributes in Escherichia coli Genomes Representing Globally Prevalent Lineages, Including High-Risk Clonal Complexes.

机构信息

Pathogen Biology Laboratory, Department of Biotechnology and Bioinformatics, University of Hyderabadgrid.18048.35, Hyderabad, India.

出版信息

mBio. 2021 Feb 22;13(1):e0379621. doi: 10.1128/mbio.03796-21. Epub 2022 Feb 15.

DOI:10.1128/mbio.03796-21

PMID:35164570

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8844930/

Abstract

摘要

基于基因组信息学和机器学习的方法鉴定全球流行谱系（包括高风险克隆群）中大肠杆菌基因组中的抗药性编码特征和毒力属性。

Genome Informatics and Machine Learning-Based Identification of Antimicrobial Resistance-Encoding Features and Virulence Attributes in Escherichia coli Genomes Representing Globally Prevalent Lineages, Including High-Risk Clonal Complexes.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

基于基因组信息学和机器学习的方法鉴定全球流行谱系（包括高风险克隆群）中大肠杆菌基因组中的抗药性编码特征和毒力属性。

Genome Informatics and Machine Learning-Based Identification of Antimicrobial Resistance-Encoding Features and Virulence Attributes in Escherichia coli Genomes Representing Globally Prevalent Lineages, Including High-Risk Clonal Complexes.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献