Sullivan Delaney K, Hjörleifsson Kristján Eldjárn, Swarna Nikhila P, Oakes Conrad, Holley Guillaume, Melsted Páll, Pachter Lior
Division of Biology and Biological Engineering, California Institute of Technology, 1200 E California Blvd, Pasadena, CA 91125, USA.
UCLA-Caltech Medical Scientist Training Program, David Geffen School of Medicine, University of California, Los Angeles, 885 Tiverton Drive, Los Angeles, CA 90095, USA.
Nucleic Acids Res. 2025 Jan 7;53(1). doi: 10.1093/nar/gkae1137.
In single-cell and single-nucleus RNA sequencing (RNA-seq), the coexistence of nascent (unprocessed) and mature (processed) messenger RNA (mRNA) poses challenges in accurate read mapping and the interpretation of count matrices. The traditional transcriptome reference, defining the "region of interest" in bulk RNA-seq, restricts its focus to mature mRNA transcripts. This restriction leads to two problems: reads originating outside of the "region of interest" are prone to mismapping within this region, and additionally, such external reads cannot be matched to specific transcript targets. Expanding the "region of interest" to encompass both nascent and mature mRNA transcript targets provides a more comprehensive framework for RNA-seq analysis. Here, we introduce the concept of distinguishing flanking k-mers (DFKs) to improve mapping of sequencing reads. We have developed an algorithm to identify DFKs, which serve as a sophisticated "background filter", enhancing the accuracy of mRNA quantification. This dual strategy of an expanded region of interest coupled with the use of DFKs enhances the precision in quantifying both mature and nascent mRNA molecules, as well as in delineating reads of ambiguous status.
在单细胞和单细胞核RNA测序(RNA-seq)中,新生(未加工)和成熟(已加工)信使RNA(mRNA)的共存给准确的读段映射和计数矩阵的解释带来了挑战。传统的转录组参考在批量RNA-seq中定义了“感兴趣区域”,其关注点仅限于成熟的mRNA转录本。这种限制导致了两个问题:起源于“感兴趣区域”之外的读段容易在该区域内错配,此外,此类外部读段无法与特定的转录本靶点匹配。将“感兴趣区域”扩展到包括新生和成熟的mRNA转录本靶点,为RNA-seq分析提供了一个更全面的框架。在这里,我们引入了区分侧翼k-mer(DFK)的概念,以改进测序读段的映射。我们开发了一种算法来识别DFK,它作为一种复杂的“背景过滤器”,提高了mRNA定量的准确性。这种扩大感兴趣区域并结合使用DFK的双重策略提高了定量成熟和新生mRNA分子以及描绘模糊状态读段的精度。