利用可解释的泛基因组跨越回归提高细菌基因型-表型关联的预测能力。

Improved Prediction of Bacterial Genotype-Phenotype Associations Using Interpretable Pangenome-Spanning Regressions.

机构信息

MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, Imperial College London, London, United Kingdom

Oslo Centre for Biostatistics and Epidemiology, Department of Biostatistics, University of Oslo, Oslo, Norway.

出版信息

mBio. 2020 Jul 7;11(4):e01344-20. doi: 10.1128/mBio.01344-20.

DOI:10.1128/mBio.01344-20

PMID:32636251

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7343994/

Abstract

Discovery of genetic variants underlying bacterial phenotypes and the prediction of phenotypes such as antibiotic resistance are fundamental tasks in bacterial genomics. Genome-wide association study (GWAS) methods have been applied to study these relations, but the plastic nature of bacterial genomes and the clonal structure of bacterial populations creates challenges. We introduce an alignment-free method which finds sets of loci associated with bacterial phenotypes, quantifies the total effect of genetics on the phenotype, and allows accurate phenotype prediction, all within a single computationally scalable joint modeling framework. Genetic variants covering the entire pangenome are compactly represented by extended DNA sequence words known as unitigs, and model fitting is achieved using elastic net penalization, an extension of standard multiple regression. Using an extensive set of state-of-the-art bacterial population genomic data sets, we demonstrate that our approach performs accurate phenotype prediction, comparable to popular machine learning methods, while retaining both interpretability and computational efficiency. Compared to those of previous approaches, which test each genotype-phenotype association separately for each variant and apply a significance threshold, the variants selected by our joint modeling approach overlap substantially. Being able to identify the genetic variants responsible for specific bacterial phenotypes has been the goal of bacterial genetics since its inception and is fundamental to our current level of understanding of bacteria. This identification has been based primarily on painstaking experimentation, but the availability of large data sets of whole genomes with associated phenotype metadata promises to revolutionize this approach, not least for important clinical phenotypes that are not amenable to laboratory analysis. These models of phenotype-genotype association can in the future be used for rapid prediction of clinically important phenotypes such as antibiotic resistance and virulence by rapid-turnaround or point-of-care tests. However, despite much effort being put into adapting genome-wide association study (GWAS) approaches to cope with bacterium-specific problems, such as strong population structure and horizontal gene exchange, current approaches are not yet optimal. We describe a method that advances methodology for both association and generation of portable prediction models.

摘要

发现与细菌表型相关的遗传变异体以及预测抗生素耐药性等表型是细菌基因组学的基本任务。全基因组关联研究（GWAS）方法已被应用于研究这些关系，但细菌基因组的可塑性和细菌种群的克隆结构带来了挑战。我们引入了一种无比对方法，该方法可以找到与细菌表型相关的基因座集，量化遗传对表型的总影响，并允许在单个可计算的联合建模框架内进行准确的表型预测。涵盖整个泛基因组的遗传变异体由称为单元的扩展 DNA 序列字紧凑地表示，并且通过弹性网络惩罚来实现模型拟合，这是标准多元回归的扩展。使用广泛的最新细菌群体基因组数据集，我们证明了我们的方法可以进行准确的表型预测，与流行的机器学习方法相当，同时保留可解释性和计算效率。与之前的方法相比，我们的联合建模方法选择的变体与之前的方法有很大的重叠，这些方法分别针对每个变体测试每个基因型-表型关联，并应用显著性阈值。自细菌遗传学诞生以来，确定导致特定细菌表型的遗传变异体一直是其目标，这也是我们目前对细菌理解的基础。这种鉴定主要基于艰苦的实验，但具有相关表型元数据的整个基因组的大型数据集的可用性有望彻底改变这种方法，尤其是对于不易进行实验室分析的重要临床表型。这些表型-基因型关联模型将来可以用于通过快速周转或即时护理测试快速预测临床上重要的表型，例如抗生素耐药性和毒力。然而，尽管人们努力适应全基因组关联研究（GWAS）方法来应对细菌特有的问题，例如强烈的种群结构和水平基因交换，但目前的方法还不是最佳的。我们描述了一种方法，该方法推进了关联和生成便携式预测模型的方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d7a3/7343994/4142dad7b0d6/mBio.01344-20-f0001.jpg

相似文献

Improved Prediction of Bacterial Genotype-Phenotype Associations Using Interpretable Pangenome-Spanning Regressions.

mBio. 2020 Jul 7;11(4):e01344-20. doi: 10.1128/mBio.01344-20.

Erratum: High-Throughput Identification of Resistance to Pseudomonas syringae pv. Tomato in Tomato using Seedling Flood Assay.

J Vis Exp. 2023 Oct 18(200). doi: 10.3791/6576.

High-throughput phenotype-to-genotype testing of meningococcal carriage and disease isolates detects genetic determinants of disease-relevant phenotypic traits.

mBio. 2024 Dec 11;15(12):e0305924. doi: 10.1128/mbio.03059-24. Epub 2024 Oct 30.

High-throughput phenogenotyping clinical strains reveals bacterial determinants of treatment outcomes.

bioRxiv. 2023 Apr 10:2023.04.09.536166. doi: 10.1101/2023.04.09.536166.

Whole-Genome Sequencing and Concordance Between Antimicrobial Susceptibility Genotypes and Phenotypes of Bacterial Isolates Associated with Bovine Respiratory Disease.

G3 (Bethesda). 2017 Sep 7;7(9):3059-3071. doi: 10.1534/g3.117.1137.

Benchmarking bacterial genome-wide association study methods using simulated genomes and phenotypes.

Microb Genom. 2020 Mar;6(3). doi: 10.1099/mgen.0.000337.

Genomic prediction using preselected DNA variants from a GWAS with whole-genome sequence data in Holstein-Friesian cattle.

Genet Sel Evol. 2016 Dec 1;48(1):95. doi: 10.1186/s12711-016-0274-1.

Enhancing genomic prediction with genome-wide association studies in multiparental maize populations.

Heredity (Edinb). 2017 Jun;118(6):585-593. doi: 10.1038/hdy.2017.4. Epub 2017 Feb 15.

microGWAS: a computational pipeline to perform large-scale bacterial genome-wide association studies.

Microb Genom. 2025 Feb;11(2). doi: 10.1099/mgen.0.001349.

Haplotype function score improves biological interpretation and cross-ancestry polygenic prediction of human complex traits.

Elife. 2024 Apr 19;12:RP92574. doi: 10.7554/eLife.92574.

引用本文的文献

Advancing microbial risk assessment: perspectives from the evolution of detection technologies.

NPJ Sci Food. 2025 Jul 28;9(1):157. doi: 10.1038/s41538-025-00527-3.

Feature selection and aggregation for antibiotic resistance GWAS in : a comparative study.

Front Microbiol. 2025 Jun 18;16:1586476. doi: 10.3389/fmicb.2025.1586476. eCollection 2025.

Suppression of gut colonization by multidrug-resistant Escherichia coli clinical isolates through cooperative niche exclusion.

Nat Commun. 2025 Jul 1;16(1):5426. doi: 10.1038/s41467-025-61327-7.

Whole-genome phenotype prediction with machine learning: open problems in bacterial genomics.

Bioinformatics. 2025 Jul 1;41(7). doi: 10.1093/bioinformatics/btaf206.

Quantitative prediction of disinfectant tolerance in Listeria monocytogenes using whole genome sequencing and machine learning.

Sci Rep. 2025 Mar 26;15(1):10382. doi: 10.1038/s41598-025-94321-6.

aurora: a machine learning gwas tool for analyzing microbial habitat adaptation.

Genome Biol. 2025 Mar 23;26(1):66. doi: 10.1186/s13059-025-03524-7.

Comparison of pharyngeal and invasive isolates of by whole-genome sequencing in Toronto, Canada.

Microbiol Spectr. 2025 Apr;13(4):e0214124. doi: 10.1128/spectrum.02141-24. Epub 2025 Feb 13.

Machine learning reveals the dynamic importance of accessory sequences for outbreak clustering.

mBio. 2025 Mar 12;16(3):e0265024. doi: 10.1128/mbio.02650-24. Epub 2025 Jan 28.

Where the Patterns Are: Repetition-Aware Compression for Colored de Bruijn Graphs.

J Comput Biol. 2024 Oct;31(10):1022-1044. doi: 10.1089/cmb.2024.0714. Epub 2024 Oct 9.

Leveraging genomic information to predict environmental preferences of bacteria.

ISME J. 2024 Jan 8;18(1). doi: 10.1093/ismejo/wrae195.

本文引用的文献

Major role of iron uptake systems in the intrinsic extra-intestinal virulence of the genus Escherichia revealed by a genome-wide association study.

PLoS Genet. 2020 Oct 28;16(10):e1009065. doi: 10.1371/journal.pgen.1009065. eCollection 2020 Oct.

Bifrost: highly parallel construction and indexing of colored and compacted de Bruijn graphs.

Genome Biol. 2020 Sep 17;21(1):249. doi: 10.1186/s13059-020-02135-8.

Understanding and predicting ciprofloxacin minimum inhibitory concentration in Escherichia coli with machine learning.

Sci Rep. 2020 Sep 14;10(1):15026. doi: 10.1038/s41598-020-71693-5.

Adaptation to the cervical environment is associated with increased antibiotic susceptibility in Neisseria gonorrhoeae.

Nat Commun. 2020 Aug 17;11(1):4126. doi: 10.1038/s41467-020-17980-1.

Benchmarking bacterial genome-wide association study methods using simulated genomes and phenotypes.

Microb Genom. 2020 Mar;6(3). doi: 10.1099/mgen.0.000337.

Discordant bioinformatic predictions of antimicrobial resistance from whole-genome sequencing data of bacterial isolates: an inter-laboratory study.

Microb Genom. 2020 Feb;6(2). doi: 10.1099/mgen.0.000335. Epub 2020 Feb 12.

Predicting antimicrobial resistance in Pseudomonas aeruginosa with machine learning-enabled molecular diagnostics.

EMBO Mol Med. 2020 Mar 6;12(3):e10264. doi: 10.15252/emmm.201910264. Epub 2020 Feb 12.

Genomic epidemiology of penicillin-non-susceptible .

Microb Genom. 2019 Oct;5(10). doi: 10.1099/mgen.0.000305. Epub 2019 Oct 14.

One neuron versus deep learning in aftershock prediction.

Nature. 2019 Oct;574(7776):E1-E3. doi: 10.1038/s41586-019-1582-8. Epub 2019 Oct 2.

Evaluation of parameters affecting performance and reliability of machine learning-based antibiotic susceptibility testing from whole genome sequencing data.

PLoS Comput Biol. 2019 Sep 3;15(9):e1007349. doi: 10.1371/journal.pcbi.1007349. eCollection 2019 Sep.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用可解释的泛基因组跨越回归提高细菌基因型-表型关联的预测能力。

Improved Prediction of Bacterial Genotype-Phenotype Associations Using Interpretable Pangenome-Spanning Regressions.

机构信息

MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, Imperial College London, London, United Kingdom

Oslo Centre for Biostatistics and Epidemiology, Department of Biostatistics, University of Oslo, Oslo, Norway.

出版信息

mBio. 2020 Jul 7;11(4):e01344-20. doi: 10.1128/mBio.01344-20.

DOI:10.1128/mBio.01344-20

PMID:32636251

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7343994/

Abstract

摘要

利用可解释的泛基因组跨越回归提高细菌基因型-表型关联的预测能力。

Improved Prediction of Bacterial Genotype-Phenotype Associations Using Interpretable Pangenome-Spanning Regressions.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

利用可解释的泛基因组跨越回归提高细菌基因型-表型关联的预测能力。

Improved Prediction of Bacterial Genotype-Phenotype Associations Using Interpretable Pangenome-Spanning Regressions.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献