在明确的因果假设下，评估从真实世界的基因分型数据中预测抗微生物耐药性的潜在偏差。

Assessing putative bias in prediction of anti-microbial resistance from real-world genotyping data under explicit causal assumptions.

机构信息

Department of Epidemiology, University of Florida, Gainesville 32610, FL, USA.

Department of Computer Science and Information and Engineering, University of Florida, Gainesville 32611, FL, USA.

出版信息

Artif Intell Med. 2022 Aug;130:102326. doi: 10.1016/j.artmed.2022.102326. Epub 2022 Jun 3.

DOI:10.1016/j.artmed.2022.102326

PMID:35809965

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9425730/

Abstract

Whole genome sequencing (WGS) is quickly becoming the customary means for identification of antimicrobial resistance (AMR) due to its ability to obtain high resolution information about the genes and mechanisms that are causing resistance and driving pathogen mobility. By contrast, traditional phenotypic (antibiogram) testing cannot easily elucidate such information. Yet development of AMR prediction tools from genotype-phenotype data can be biased, since sampling is non-randomized. Sample provenience, period of collection, and species representation can confound the association of genetic traits with AMR. Thus, prediction models can perform poorly on new data with sampling distribution shifts. In this work -under an explicit set of causal assumptions- we evaluate the effectiveness of propensity-based rebalancing and confounding adjustment on antibiotic resistance prediction using genotype-phenotype AMR data from the Pathosystems Resource Integration Center (PATRIC). We select bacterial genotypes (encoded as k-mer signatures, i.e., DNA fragments of length k), country, year, species, and AMR phenotypes for the tetracycline drug class, preparing test data with recent genomes coming from a single country. We test boosted logistic regression (BLR) and random forests (RF) with/without bias-handling. On 10,936 instances, we find evidence of species, location and year imbalance with respect to the AMR phenotype. The crude versus bias-adjusted change in effect of genetic signatures on AMR varies but only moderately (selecting the top 20,000 out of 40+ million k-mers). The area under the receiver operating characteristic (AUROC) of the RF (0.95) is comparable to that of BLR (0.94) on both out-of-bag samples from bootstrap and the external test (n = 1085), where AUROCs do not decrease. We observe a 1 %-5 % gain in AUROC with bias-handling compared to the sole use of genetic signatures. In conclusion, we recommend using causally-informed prediction methods for modeling real-world AMR data; however, traditional adjustment or propensity-based methods may not provide advantage in all use cases and further methodological development should be sought.

摘要

全基因组测序（WGS）由于能够获得有关导致耐药性和驱动病原体移动的基因和机制的高分辨率信息，因此迅速成为鉴定抗菌药物耐药性（AMR）的常规手段。相比之下，传统的表型（药敏试验）测试则无法轻易阐明此类信息。然而，由于采样不是随机的，基于基因型-表型数据的 AMR 预测工具的开发可能存在偏差。样本来源、采集时间和物种代表性会混淆遗传特征与 AMR 的相关性。因此，预测模型在新数据采样分布发生变化时可能表现不佳。在这项工作中——在明确的因果假设下——我们评估了基于倾向性的重新平衡和混杂调整对使用来自 Pathosystems 资源整合中心（PATRIC）的基因型-表型 AMR 数据进行抗生素耐药性预测的有效性。我们选择了细菌基因型（表示为 k-mer 特征，即长度为 k 的 DNA 片段）、国家、年份、物种和四环素类药物的 AMR 表型，为测试数据准备了来自单个国家的最近基因组。我们使用带有/不带有偏差处理的提升逻辑回归（BLR）和随机森林（RF）进行测试。在 10936 个实例上，我们发现了 AMR 表型与物种、位置和年份不平衡的证据。遗传特征对 AMR 的影响的未经调整与调整后的变化在幅度上有所不同，但仅略有不同（从 4000 多万个 k-mer 中选择前 20000 个）。RF（0.95）的接收者操作特征（AUROC）与 BLR（0.94）在bootstrap 的外袋样本和外部测试（n=1085）上都相当，AUROC 没有下降。与仅使用遗传特征相比，使用偏差处理可使 AUROC 提高 1%-5%。总之，我们建议对建模实际 AMR 数据使用因果推理预测方法；然而，在所有用例中，传统调整或基于倾向性的方法可能没有优势，应该寻求进一步的方法开发。

相似文献

Assessing putative bias in prediction of anti-microbial resistance from real-world genotyping data under explicit causal assumptions.在明确的因果假设下，评估从真实世界的基因分型数据中预测抗微生物耐药性的潜在偏差。

Artif Intell Med. 2022 Aug;130:102326. doi: 10.1016/j.artmed.2022.102326. Epub 2022 Jun 3.

Whole genome sequencing (WGS) fails to detect antimicrobial resistance (AMR) from heteroresistant subpopulation of Salmonella enterica.全基因组测序（WGS）无法从肠炎沙门氏菌的异质性耐药亚群中检测到抗菌药物耐药性（AMR）。

Food Microbiol. 2020 Oct;91:103530. doi: 10.1016/j.fm.2020.103530. Epub 2020 Apr 25.

VAMPr: VAriant Mapping and Prediction of antibiotic resistance via explainable features and machine learning.VAMPr：通过可解释特征和机器学习对抗生素耐药性进行变异映射和预测。

PLoS Comput Biol. 2020 Jan 13;16(1):e1007511. doi: 10.1371/journal.pcbi.1007511. eCollection 2020 Jan.

Keeping up with the pathogens: improved antimicrobial resistance detection and prediction from Pseudomonas aeruginosa genomes.紧跟病原体：从铜绿假单胞菌基因组中提高对抗菌药物耐药性的检测和预测。

Genome Med. 2024 Jun 7;16(1):78. doi: 10.1186/s13073-024-01346-z.

WGS-Based Prediction and Analysis of Antimicrobial Resistance in Isolates From Israel.基于全基因组测序的以色列分离株抗菌药物耐药性预测与分析

Front Cell Infect Microbiol. 2020 Aug 13;10:365. doi: 10.3389/fcimb.2020.00365. eCollection 2020.

Prediction of Antimicrobial Resistance in Clinical Enterococcus faecium Isolates Using a Rules-Based Analysis of Whole-Genome Sequences.基于全基因组序列规则分析预测临床粪肠球菌分离株的抗菌药物耐药性

Antimicrob Agents Chemother. 2022 Jan 18;66(1):e0119621. doi: 10.1128/AAC.01196-21. Epub 2021 Oct 25.

Scalable de novo classification of antibiotic resistance of Mycobacterium tuberculosis.结核分枝杆菌抗生素耐药性的可扩展从头分类。

Bioinformatics. 2024 Jun 28;40(Suppl 1):i39-i47. doi: 10.1093/bioinformatics/btae243.

Use of online tools for antimicrobial resistance prediction by whole-genome sequencing in methicillin-resistant Staphylococcus aureus (MRSA) and vancomycin-resistant enterococci (VRE).利用在线工具进行全基因组测序预测耐甲氧西林金黄色葡萄球菌（MRSA）和万古霉素耐药肠球菌（VRE）的抗菌药物耐药性。

J Glob Antimicrob Resist. 2019 Dec;19:136-143. doi: 10.1016/j.jgar.2019.04.006. Epub 2019 Apr 18.

Serovar Typhi in Bangladesh: Exploration of Genomic Diversity and Antimicrobial Resistance.孟加拉国的伤寒血清型：基因组多样性与抗药性的探索。

mBio. 2018 Nov 13;9(6):e02112-18. doi: 10.1128/mBio.02112-18.

Molecular Insights into Antimicrobial Resistance Traits of Commensal Human Gut Microbiota.解析：原文中没有生僻词汇，因此翻译时不需要做任何调整。

Microb Ecol. 2019 Feb;77(2):546-557. doi: 10.1007/s00248-018-1228-7. Epub 2018 Jul 16.

引用本文的文献

Towards routine employment of computational tools for antimicrobial resistance determination via high-throughput sequencing.通过高通量测序实现计算工具在抗菌药物耐药性检测中的常规应用。

Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbac020.

本文引用的文献

KARGA: Multi-platform Toolkit for -mer-based Antibiotic Resistance Gene Analysis of High-throughput Sequencing Data.KARGA：用于高通量测序数据中基于-聚体的抗生素抗性基因分析的多平台工具包。

IEEE EMBS Int Conf Biomed Health Inform. 2021 Jul;2021. doi: 10.1109/bhi50953.2021.9508479. Epub 2021 Aug 10.

Propensity score synthetic augmentation matching using generative adversarial networks (PSSAM-GAN).使用生成对抗网络的倾向得分合成增强匹配（PSSAM-GAN）。

Comput Methods Programs Biomed Update. 2021;1. doi: 10.1016/j.cmpbup.2021.100020. Epub 2021 Jul 16.

On the logic of collapsibility for causal effect measures.关于因果效应度量的可折叠性逻辑。

Biom J. 2022 Feb;64(2):235-242. doi: 10.1002/bimj.202000305. Epub 2021 Feb 12.

Microbial Resistance Movements: An Overview of Global Public Health Threats Posed by Antimicrobial Resistance, and How Best to Counter.微生物耐药动态：对抗菌药物耐药性所构成的全球公共卫生威胁的概述以及最佳应对方法。

Front Public Health. 2020 Nov 4;8:535668. doi: 10.3389/fpubh.2020.535668. eCollection 2020.

Analytical Performance Validation of Next-Generation Sequencing Based Clinical Microbiology Assays Using a K-mer Analysis Workflow.使用K-mer分析工作流程对基于下一代测序的临床微生物学检测进行分析性能验证

Front Microbiol. 2020 Aug 5;11:1883. doi: 10.3389/fmicb.2020.01883. eCollection 2020.

Metagenomics reveals impact of geography and acute diarrheal disease on the Central Indian human gut microbiome.宏基因组学揭示了地理和急性腹泻病对印度中部人类肠道微生物组的影响。

Gut Microbes. 2020 Nov 9;12(1):1752605. doi: 10.1080/19490976.2020.1752605. Epub 2020 May 27.

Portable nanopore analytics: are we there yet?便携式纳米孔分析：我们做到了吗？

Bioinformatics. 2020 Aug 15;36(16):4399-4405. doi: 10.1093/bioinformatics/btaa237.

MEGARes 2.0: a database for classification of antimicrobial drug, biocide and metal resistance determinants in metagenomic sequence data.MEGARes 2.0：一个用于分类宏基因组序列数据中抗菌药物、杀生物剂和金属抗性决定因子的数据库。

Nucleic Acids Res. 2020 Jan 8;48(D1):D561-D569. doi: 10.1093/nar/gkz1010.

The PATRIC Bioinformatics Resource Center: expanding data and analysis capabilities.PATRIC 生物信息学资源中心：扩展数据和分析功能。

Nucleic Acids Res. 2020 Jan 8;48(D1):D606-D612. doi: 10.1093/nar/gkz943.

CARD 2020: antibiotic resistome surveillance with the comprehensive antibiotic resistance database.CARD 2020：利用综合抗生素耐药数据库进行抗生素耐药组监测。

Nucleic Acids Res. 2020 Jan 8;48(D1):D517-D525. doi: 10.1093/nar/gkz935.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。