利用高斯自基准框架增强 RNA-seq 偏置缓解：实现无偏测序数据。

Enhancing RNA-seq bias mitigation with the Gaussian self-benchmarking framework: towards unbiased sequencing data.

机构信息

Faculty of Synthetic Biology, Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Shenzhen University of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China.

State Key Laboratory of Chemical Oncogenomics, School of Chemical Biology and Biotechnology, Peking University Shenzhen Graduate School, Shenzhen, China.

出版信息

BMC Genomics. 2024 Sep 30;25(1):904. doi: 10.1186/s12864-024-10814-0.

DOI:10.1186/s12864-024-10814-0

PMID:39350040

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11441123/

Abstract

BACKGROUND

RNA sequencing is a vital technique for analyzing RNA behavior in cells, but it often suffers from various biases that distort the data. Traditional methods to address these biases are typically empirical and handle them individually, limiting their effectiveness. Our study introduces the Gaussian Self-Benchmarking (GSB) framework, a novel approach that leverages the natural distribution patterns of guanine (G) and cytosine (C) content in RNA to mitigate multiple biases simultaneously. This method is grounded in a theoretical model, organizing k-mers based on their GC content and applying a Gaussian model for alignment to ensure empirical sequencing data closely match their theoretical distribution.

RESULTS

The GSB framework demonstrated superior performance in mitigating sequencing biases compared to existing methods. Testing with synthetic RNA constructs and real human samples showed that the GSB approach not only addresses individual biases more effectively but also manages co-existing biases jointly. The framework's reliance on accurately pre-determined parameters like mean and standard deviation of GC content distribution allows for a more precise representation of RNA samples. This results in improved accuracy and reliability of RNA sequencing data, enhancing our understanding of RNA behavior in health and disease.

CONCLUSIONS

The GSB framework presents a significant advancement in RNA sequencing analysis by providing a well-validated, multi-bias mitigation strategy. It functions independently from previously identified dataset flaws and sets a new standard for unbiased RNA sequencing results. This development enhances the reliability of RNA studies, broadening the potential for scientific breakthroughs in medicine and biology, particularly in genetic disease research and the development of targeted treatments.

摘要

背景

RNA 测序是分析细胞中 RNA 行为的重要技术，但它经常受到各种偏倚的影响，这些偏倚会扭曲数据。传统的方法通常是经验性的，并且单独处理这些偏倚，限制了它们的效果。我们的研究引入了高斯自基准（GSB）框架，这是一种利用 RNA 中鸟嘌呤（G）和胞嘧啶（C）含量的自然分布模式来同时减轻多种偏倚的新方法。该方法基于理论模型，根据其 GC 含量组织 k-mer，并应用高斯模型进行比对，以确保经验测序数据与其理论分布紧密匹配。

结果

GSB 框架在减轻测序偏倚方面表现出优于现有方法的性能。使用合成 RNA 构建体和真实人类样本进行测试表明，GSB 方法不仅更有效地解决了单个偏倚，而且还联合管理了共存的偏倚。该框架对 GC 含量分布的均值和标准差等准确预定义参数的依赖，允许更精确地表示 RNA 样本。这导致 RNA 测序数据的准确性和可靠性得到提高，从而增强了我们对健康和疾病中 RNA 行为的理解。