• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

从有限数据中学习:从全基因组测序数据预测抗生素耐药性的最佳实践技术。

Learning From Limited Data: Towards Best Practice Techniques for Antimicrobial Resistance Prediction From Whole Genome Sequencing Data.

机构信息

Ares Genetics GmbH, Vienna, Austria.

Division of Computational Systems Biology, Department of Microbiology and Ecosystem Science, University of Vienna, Vienna, Austria.

出版信息

Front Cell Infect Microbiol. 2021 Feb 15;11:610348. doi: 10.3389/fcimb.2021.610348. eCollection 2021.

DOI:10.3389/fcimb.2021.610348
PMID:33659219
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7917081/
Abstract

Antimicrobial resistance prediction from whole genome sequencing data (WGS) is an emerging application of machine learning, promising to improve antimicrobial resistance surveillance and outbreak monitoring. Despite significant reductions in sequencing cost, the availability and sampling diversity of WGS data with matched antimicrobial susceptibility testing (AST) profiles required for training of WGS-AST prediction models remains limited. Best practice machine learning techniques are required to ensure trained models generalize to independent data for optimal predictive performance. Limited data restricts the choice of machine learning training and evaluation methods and can result in overestimation of model performance. We demonstrate that the widely used random k-fold cross-validation method is ill-suited for application to small bacterial genomics datasets and offer an alternative cross-validation method based on genomic distance. We benchmarked three machine learning architectures previously applied to the WGS-AST problem on a set of 8,704 genome assemblies from five clinically relevant pathogens across 77 species-compound combinations collated from public databases. We show that individual models can be effectively ensembled to improve model performance. By combining models stacked generalization with cross-validation, a model ensembling technique suitable for small datasets, we improved average sensitivity and specificity of individual models by 1.77% and 3.20%, respectively. Furthermore, stacked models exhibited improved robustness and were thus less prone to outlier performance drops than individual component models. In this study, we highlight best practice techniques for antimicrobial resistance prediction from WGS data and introduce the combination of genome distance aware cross-validation and stacked generalization for robust and accurate WGS-AST.

摘要

从全基因组测序数据 (WGS) 预测抗菌药物耐药性是机器学习的一项新兴应用,有望改善抗菌药物耐药性监测和爆发监测。尽管测序成本显著降低,但用于训练 WGS-AST 预测模型的具有匹配抗菌药物敏感性测试 (AST) 谱的 WGS 数据的可用性和采样多样性仍然有限。需要采用最佳实践机器学习技术来确保训练的模型能够推广到独立数据,以实现最佳预测性能。数据有限限制了机器学习训练和评估方法的选择,并可能导致模型性能的高估。我们证明了广泛使用的随机 k 折交叉验证方法不适用于小型细菌基因组数据集,并提出了一种基于基因组距离的替代交叉验证方法。我们在一组来自五个临床相关病原体的 8704 个基因组组装体上对以前应用于 WGS-AST 问题的三种机器学习架构进行了基准测试,这些基因组组装体来自公共数据库中汇集的 77 个种属组合。我们表明,可以有效地对个体模型进行集成以提高模型性能。通过对模型进行堆叠泛化和交叉验证,即一种适用于小数据集的模型集成技术,我们将单个模型的平均灵敏度和特异性分别提高了 1.77%和 3.20%。此外,堆叠模型表现出更高的稳健性,因此比单个组成模型更不容易出现异常性能下降。在这项研究中,我们强调了从 WGS 数据预测抗菌药物耐药性的最佳实践技术,并介绍了基于基因组距离的交叉验证和堆叠泛化的组合,以实现稳健和准确的 WGS-AST。

相似文献

1
Learning From Limited Data: Towards Best Practice Techniques for Antimicrobial Resistance Prediction From Whole Genome Sequencing Data.从有限数据中学习:从全基因组测序数据预测抗生素耐药性的最佳实践技术。
Front Cell Infect Microbiol. 2021 Feb 15;11:610348. doi: 10.3389/fcimb.2021.610348. eCollection 2021.
2
Whole-genome sequencing for antimicrobial surveillance: species-specific quality thresholds and data evaluation from the network of the European Union Reference Laboratory for Antimicrobial Resistance genomic proficiency tests of 2021 and 2022.全基因组测序用于抗菌药物监测:来自 2021 年和 2022 年欧盟抗菌药物耐药参考实验室基因组能力测试网络的针对特定物种的质量阈值和数据评估。
mSystems. 2024 Sep 17;9(9):e0016024. doi: 10.1128/msystems.00160-24. Epub 2024 Aug 6.
3
Integrating whole genome sequencing and machine learning for predicting antimicrobial resistance in critical pathogens: a systematic review of antimicrobial susceptibility tests.整合全基因组测序和机器学习预测关键病原体的抗菌药物耐药性:抗菌药物敏感性试验的系统评价。
PeerJ. 2024 Oct 9;12:e18213. doi: 10.7717/peerj.18213. eCollection 2024.
4
Discordant bioinformatic predictions of antimicrobial resistance from whole-genome sequencing data of bacterial isolates: an inter-laboratory study.从细菌分离物的全基因组测序数据中得出的抗菌药物耐药性的生物信息学预测结果不一致:一项实验室间研究。
Microb Genom. 2020 Feb;6(2). doi: 10.1099/mgen.0.000335. Epub 2020 Feb 12.
5
The role of whole genome sequencing in antimicrobial susceptibility testing of bacteria: report from the EUCAST Subcommittee.全基因组测序在细菌抗菌药物敏感性试验中的作用:来自 EUCAST 分委会的报告。
Clin Microbiol Infect. 2017 Jan;23(1):2-22. doi: 10.1016/j.cmi.2016.11.012. Epub 2016 Nov 23.
6
Genome-Wide Mutation Scoring for Machine-Learning-Based Antimicrobial Resistance Prediction.基于全基因组突变评分的机器学习抗菌药物耐药性预测
Int J Mol Sci. 2021 Dec 2;22(23):13049. doi: 10.3390/ijms222313049.
7
Evaluation of parameters affecting performance and reliability of machine learning-based antibiotic susceptibility testing from whole genome sequencing data.评估基于机器学习的全基因组测序数据抗生素药敏试验性能和可靠性的影响因素。
PLoS Comput Biol. 2019 Sep 3;15(9):e1007349. doi: 10.1371/journal.pcbi.1007349. eCollection 2019 Sep.
8
Machine learning for identifying resistance features of using whole-genome sequence single nucleotide polymorphisms.利用全基因组序列单核苷酸多态性识别 的耐药特征的机器学习方法。
J Med Microbiol. 2021 Nov;70(11). doi: 10.1099/jmm.0.001474.
9
Latent class analysis to assess whole-genome sequencing versus broth microdilution for monitoring antimicrobial resistance in livestock.应用潜在类别分析评估全基因组测序与肉汤微量稀释法在监测家畜抗菌药物耐药性中的应用。
Prev Vet Med. 2021 Aug;193:105406. doi: 10.1016/j.prevetmed.2021.105406. Epub 2021 Jun 4.
10
Predicting Listeria monocytogenes virulence potential using whole genome sequencing and machine learning.利用全基因组测序和机器学习预测李斯特菌毒力潜能。
Int J Food Microbiol. 2024 Jan 30;410:110491. doi: 10.1016/j.ijfoodmicro.2023.110491. Epub 2023 Nov 17.

引用本文的文献

1
Advancements in AI-driven drug sensitivity testing research.人工智能驱动的药物敏感性测试研究进展。
Front Cell Infect Microbiol. 2025 May 2;15:1560569. doi: 10.3389/fcimb.2025.1560569. eCollection 2025.
2
Integrating whole genome sequencing and machine learning for predicting antimicrobial resistance in critical pathogens: a systematic review of antimicrobial susceptibility tests.整合全基因组测序和机器学习预测关键病原体的抗菌药物耐药性:抗菌药物敏感性试验的系统评价。
PeerJ. 2024 Oct 9;12:e18213. doi: 10.7717/peerj.18213. eCollection 2024.
3
Tackling the Antimicrobial Resistance "Pandemic" with Machine Learning Tools: A Summary of Available Evidence.

本文引用的文献

1
Amino Acid -mer Feature Extraction for Quantitative Antimicrobial Resistance (AMR) Prediction by Machine Learning and Model Interpretation for Biological Insights.用于通过机器学习进行定量抗菌药物耐药性(AMR)预测及生物洞察的模型解释的氨基酸-mer特征提取
Biology (Basel). 2020 Oct 28;9(11):365. doi: 10.3390/biology9110365.
2
Large-scale assessment of antimicrobial resistance marker databases for genetic phenotype prediction: a systematic review.大规模评估抗菌药物耐药性标记数据库以进行遗传表型预测:系统评价。
J Antimicrob Chemother. 2020 Nov 1;75(11):3099-3108. doi: 10.1093/jac/dkaa257.
3
Improved Prediction of Bacterial Genotype-Phenotype Associations Using Interpretable Pangenome-Spanning Regressions.
使用机器学习工具应对抗微生物药物耐药性“大流行”:现有证据综述
Microorganisms. 2024 Apr 23;12(5):842. doi: 10.3390/microorganisms12050842.
4
Machine learning and phylogenetic analysis allow for predicting antibiotic resistance in M. tuberculosis.机器学习和系统发育分析可用于预测结核分枝杆菌的抗生素耐药性。
BMC Microbiol. 2023 Dec 20;23(1):404. doi: 10.1186/s12866-023-03147-7.
5
Contemporary Considerations for Establishing Reference Methods for Antibacterial Susceptibility Testing.建立抗菌药物敏感性试验参考方法的当代思考。
J Clin Microbiol. 2023 Jun 20;61(6):e0188622. doi: 10.1128/jcm.01886-22. Epub 2023 Mar 27.
6
Metagenomic Antimicrobial Susceptibility Testing from Simulated Native Patient Samples.来自模拟天然患者样本的宏基因组抗菌药敏试验。
Antibiotics (Basel). 2023 Feb 9;12(2):366. doi: 10.3390/antibiotics12020366.
7
Machine learning models for Neisseria gonorrhoeae antimicrobial susceptibility tests.淋病奈瑟菌抗菌药物敏感性试验的机器学习模型。
Ann N Y Acad Sci. 2023 Feb;1520(1):74-88. doi: 10.1111/nyas.14549. Epub 2022 Dec 27.
8
Validation and Application of Long-Read Whole-Genome Sequencing for Antimicrobial Resistance Gene Detection and Antimicrobial Susceptibility Testing.长读全基因组测序在耐药基因检测和药敏试验中的验证和应用。
Antimicrob Agents Chemother. 2023 Jan 24;67(1):e0107222. doi: 10.1128/aac.01072-22. Epub 2022 Dec 19.
9
Automated antimicrobial susceptibility testing and antimicrobial resistance genotyping using Illumina and Oxford Nanopore Technologies sequencing data among .使用Illumina和牛津纳米孔技术测序数据进行自动化抗菌药物敏感性测试和抗菌药物耐药基因分型,在……之中
Front Microbiol. 2022 Aug 8;13:973605. doi: 10.3389/fmicb.2022.973605. eCollection 2022.
10
The Relative Power of Structural Genomic Variation versus SNPs in Explaining the Quantitative Trait Growth in the Marine Teleost .结构基因组变异与单核苷酸多态性在解释海洋硬骨鱼类数量性状生长中的相对作用
Genes (Basel). 2022 Jun 23;13(7):1129. doi: 10.3390/genes13071129.
利用可解释的泛基因组跨越回归提高细菌基因型-表型关联的预测能力。
mBio. 2020 Jul 7;11(4):e01344-20. doi: 10.1128/mBio.01344-20.
4
Species Identification and Antibiotic Resistance Prediction by Analysis of Whole-Genome Sequence Data by Use of ARESdb: an Analysis of Isolates from the Unyvero Lower Respiratory Tract Infection Trial.利用 ARESdb 通过全基因组序列数据分析进行物种鉴定和抗生素耐药性预测:来自 Unyvero 下呼吸道感染试验的分离株分析。
J Clin Microbiol. 2020 Jun 24;58(7). doi: 10.1128/JCM.00273-20.
5
Prediction of Acquired Antimicrobial Resistance for Multiple Bacterial Species Using Neural Networks.使用神经网络预测多种细菌的获得性抗菌耐药性
mSystems. 2020 Jan 21;5(1):e00774-19. doi: 10.1128/mSystems.00774-19.
6
VAMPr: VAriant Mapping and Prediction of antibiotic resistance via explainable features and machine learning.VAMPr:通过可解释特征和机器学习对抗生素耐药性进行变异映射和预测。
PLoS Comput Biol. 2020 Jan 13;16(1):e1007511. doi: 10.1371/journal.pcbi.1007511. eCollection 2020 Jan.
7
UDSMProt: universal deep sequence models for protein classification.UDSMProt:用于蛋白质分类的通用深度序列模型。
Bioinformatics. 2020 Apr 15;36(8):2401-2409. doi: 10.1093/bioinformatics/btaa003.
8
Database resources of the National Center for Biotechnology Information.国家生物技术信息中心数据库资源。
Nucleic Acids Res. 2020 Jan 8;48(D1):D9-D16. doi: 10.1093/nar/gkz899.
9
Evaluation of parameters affecting performance and reliability of machine learning-based antibiotic susceptibility testing from whole genome sequencing data.评估基于机器学习的全基因组测序数据抗生素药敏试验性能和可靠性的影响因素。
PLoS Comput Biol. 2019 Sep 3;15(9):e1007349. doi: 10.1371/journal.pcbi.1007349. eCollection 2019 Sep.
10
ProteinNet: a standardized data set for machine learning of protein structure.ProteinNet:用于蛋白质结构机器学习的标准化数据集。
BMC Bioinformatics. 2019 Jun 11;20(1):311. doi: 10.1186/s12859-019-2932-0.