Suppr超能文献

在明确的因果假设下,评估从真实世界的基因分型数据中预测抗微生物耐药性的潜在偏差。

Assessing putative bias in prediction of anti-microbial resistance from real-world genotyping data under explicit causal assumptions.

机构信息

Department of Epidemiology, University of Florida, Gainesville 32610, FL, USA.

Department of Computer Science and Information and Engineering, University of Florida, Gainesville 32611, FL, USA.

出版信息

Artif Intell Med. 2022 Aug;130:102326. doi: 10.1016/j.artmed.2022.102326. Epub 2022 Jun 3.

Abstract

Whole genome sequencing (WGS) is quickly becoming the customary means for identification of antimicrobial resistance (AMR) due to its ability to obtain high resolution information about the genes and mechanisms that are causing resistance and driving pathogen mobility. By contrast, traditional phenotypic (antibiogram) testing cannot easily elucidate such information. Yet development of AMR prediction tools from genotype-phenotype data can be biased, since sampling is non-randomized. Sample provenience, period of collection, and species representation can confound the association of genetic traits with AMR. Thus, prediction models can perform poorly on new data with sampling distribution shifts. In this work -under an explicit set of causal assumptions- we evaluate the effectiveness of propensity-based rebalancing and confounding adjustment on antibiotic resistance prediction using genotype-phenotype AMR data from the Pathosystems Resource Integration Center (PATRIC). We select bacterial genotypes (encoded as k-mer signatures, i.e., DNA fragments of length k), country, year, species, and AMR phenotypes for the tetracycline drug class, preparing test data with recent genomes coming from a single country. We test boosted logistic regression (BLR) and random forests (RF) with/without bias-handling. On 10,936 instances, we find evidence of species, location and year imbalance with respect to the AMR phenotype. The crude versus bias-adjusted change in effect of genetic signatures on AMR varies but only moderately (selecting the top 20,000 out of 40+ million k-mers). The area under the receiver operating characteristic (AUROC) of the RF (0.95) is comparable to that of BLR (0.94) on both out-of-bag samples from bootstrap and the external test (n = 1085), where AUROCs do not decrease. We observe a 1 %-5 % gain in AUROC with bias-handling compared to the sole use of genetic signatures. In conclusion, we recommend using causally-informed prediction methods for modeling real-world AMR data; however, traditional adjustment or propensity-based methods may not provide advantage in all use cases and further methodological development should be sought.

摘要

全基因组测序(WGS)由于能够获得有关导致耐药性和驱动病原体移动的基因和机制的高分辨率信息,因此迅速成为鉴定抗菌药物耐药性(AMR)的常规手段。相比之下,传统的表型(药敏试验)测试则无法轻易阐明此类信息。然而,由于采样不是随机的,基于基因型-表型数据的 AMR 预测工具的开发可能存在偏差。样本来源、采集时间和物种代表性会混淆遗传特征与 AMR 的相关性。因此,预测模型在新数据采样分布发生变化时可能表现不佳。在这项工作中——在明确的因果假设下——我们评估了基于倾向性的重新平衡和混杂调整对使用来自 Pathosystems 资源整合中心(PATRIC)的基因型-表型 AMR 数据进行抗生素耐药性预测的有效性。我们选择了细菌基因型(表示为 k-mer 特征,即长度为 k 的 DNA 片段)、国家、年份、物种和四环素类药物的 AMR 表型,为测试数据准备了来自单个国家的最近基因组。我们使用带有/不带有偏差处理的提升逻辑回归(BLR)和随机森林(RF)进行测试。在 10936 个实例上,我们发现了 AMR 表型与物种、位置和年份不平衡的证据。遗传特征对 AMR 的影响的未经调整与调整后的变化在幅度上有所不同,但仅略有不同(从 4000 多万个 k-mer 中选择前 20000 个)。RF(0.95)的接收者操作特征(AUROC)与 BLR(0.94)在bootstrap 的外袋样本和外部测试(n=1085)上都相当,AUROC 没有下降。与仅使用遗传特征相比,使用偏差处理可使 AUROC 提高 1%-5%。总之,我们建议对建模实际 AMR 数据使用因果推理预测方法;然而,在所有用例中,传统调整或基于倾向性的方法可能没有优势,应该寻求进一步的方法开发。

相似文献

本文引用的文献

3
On the logic of collapsibility for causal effect measures.关于因果效应度量的可折叠性逻辑。
Biom J. 2022 Feb;64(2):235-242. doi: 10.1002/bimj.202000305. Epub 2021 Feb 12.
7
Portable nanopore analytics: are we there yet?便携式纳米孔分析:我们做到了吗?
Bioinformatics. 2020 Aug 15;36(16):4399-4405. doi: 10.1093/bioinformatics/btaa237.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验