Department of Physics, Washington University, St. Louis, MO, USA.
Department of Bioengineering, University of Pennsylvania, Philadelphia, PA, USA.
Mol Biol Evol. 2024 Aug 2;41(8). doi: 10.1093/molbev/msae152.
Measuring the fitnesses of genetic variants is a fundamental objective in evolutionary biology. A standard approach for measuring microbial fitnesses in bulk involves labeling a library of genetic variants with unique sequence barcodes, competing the labeled strains in batch culture, and using deep sequencing to track changes in the barcode abundances over time. However, idiosyncratic properties of barcodes can induce nonuniform amplification or uneven sequencing coverage that causes some barcodes to be over- or under-represented in samples. This systematic bias can result in erroneous read count trajectories and misestimates of fitness. Here, we develop a computational method, named REBAR (Removing the Effects of Bias through Analysis of Residuals), for inferring the effects of barcode processing bias by leveraging the structure of systematic deviations in the data. We illustrate this approach by applying it to two independent data sets, and demonstrate that this method estimates and corrects for bias more accurately than standard proxies, such as GC-based corrections. REBAR mitigates bias and improves fitness estimates in high-throughput assays without introducing additional complexity to the experimental protocols, with potential applications in a range of experimental evolution and mutation screening contexts.
衡量遗传变异体的适合度是进化生物学的基本目标。一种用于批量测量微生物适合度的标准方法涉及用唯一的序列条形码标记遗传变异文库,在分批培养中竞争标记的菌株,并使用深度测序来跟踪随时间推移条形码丰度的变化。然而,条形码的特殊性质会导致非均匀扩增或不均匀的测序覆盖,从而导致某些条形码在样本中被过度或低估。这种系统偏差会导致错误的读取计数轨迹和对适合度的错误估计。在这里,我们开发了一种名为 REBAR(通过分析残差去除偏差的影响)的计算方法,通过利用数据中系统偏差的结构来推断条形码处理偏差的影响。我们通过将其应用于两个独立的数据集来说明这种方法,并证明该方法比标准代理(例如基于 GC 的校正)更准确地估计和校正偏差。REBAR 减轻了偏差并提高了高通量测定中的适合度估计,而不会给实验方案带来额外的复杂性,具有广泛的实验进化和突变筛选背景下的潜在应用。