Mok Amanda, Tunney Robert, Benegas Gonzalo, Wallace Edward W J, Lareau Liana F
Center for Computational Biology, University of California, Berkeley.
School of Biological Sciences, University of Edinburgh.
bioRxiv. 2023 Feb 22:2023.02.21.529452. doi: 10.1101/2023.02.21.529452.
Ribosome profiling quantifies translation genome-wide by sequencing ribosome-protected fragments, or footprints. Its single-codon resolution allows identification of translation regulation, such as ribosome stalls or pauses, on individual genes. However, enzyme preferences during library preparation lead to pervasive sequence artifacts that obscure translation dynamics. Widespread over- and under-representation of ribosome footprints can dominate local footprint densities and skew estimates of elongation rates by up to five fold. To address these biases and uncover true patterns of translation, we present choros, a computational method that models ribosome footprint distributions to provide bias-corrected footprint counts. choros uses negative binomial regression to accurately estimate two sets of parameters: (i) biological contributions from codon-specific translation elongation rates; and (ii) technical contributions from nuclease digestion and ligation efficiencies. We use these parameter estimates to generate bias correction factors that eliminate sequence artifacts. Applying choros to multiple ribosome profiling datasets, we are able to accurately quantify and attenuate ligation biases to provide more faithful measurements of ribosome distribution. We show that a pattern interpreted as pervasive ribosome pausing near the beginning of coding regions is likely to arise from technical biases. Incorporating choros into standard analysis pipelines will improve biological discovery from measurements of translation.
核糖体谱分析通过对核糖体保护片段(即足迹)进行测序来全基因组范围内定量翻译。其单密码子分辨率能够识别单个基因上的翻译调控,例如核糖体停滞或暂停。然而,文库制备过程中的酶偏好会导致普遍存在的序列假象,从而掩盖翻译动态。核糖体足迹广泛的过度和不足代表性会主导局部足迹密度,并使延伸率估计值偏差高达五倍。为了解决这些偏差并揭示真正的翻译模式,我们提出了choros,这是一种计算方法,它对核糖体足迹分布进行建模以提供偏差校正后的足迹计数。choros使用负二项回归来准确估计两组参数:(i)密码子特异性翻译延伸率的生物学贡献;以及(ii)核酸酶消化和连接效率的技术贡献。我们使用这些参数估计值来生成消除序列假象的偏差校正因子。将choros应用于多个核糖体谱分析数据集,我们能够准确量化并减弱连接偏差,以提供对核糖体分布更可靠的测量。我们表明,一种被解释为编码区起始附近普遍存在核糖体暂停的模式很可能是由技术偏差引起的。将choros纳入标准分析流程将改善从翻译测量中获得的生物学发现。