IRMB, INSERM U1183, Hopital Saint-Eloi, Universite de Montpellier, Montpellier, France.
CRCT, Inserm, CNRS, Université Toulouse III-Paul Sabatier, Centre de Recherches en Cancérologie de Toulouse, Toulouse, France.
Genome Biol. 2024 Oct 10;25(1):266. doi: 10.1186/s13059-024-03413-5.
Indexing techniques relying on k-mers have proven effective in searching for RNA sequences across thousands of RNA-seq libraries, but without enabling direct RNA quantification. We show here that arbitrary RNA sequences can be quantified in seconds through their decomposition into k-mers, with a precision akin to that of conventional RNA quantification methods. Using an index of the Cancer Cell Line Encyclopedia (CCLE) collection consisting of 1019 RNA-seq samples, we show that k-mer indexing offers a powerful means to reveal non-reference sequences, and variant RNAs induced by specific gene alterations, for instance in splicing factors.
基于 k-mer 的索引技术已被证明在搜索数千个 RNA-seq 文库中的 RNA 序列方面非常有效,但无法直接进行 RNA 定量。我们在这里展示,通过将任意 RNA 序列分解为 k-mer,可以在几秒钟内对其进行定量,其精度与传统的 RNA 定量方法相当。使用包含 1019 个 RNA-seq 样本的癌症细胞系百科全书(CCLE)集合的索引,我们表明 k-mer 索引提供了一种强大的方法来揭示非参考序列,以及由特定基因改变诱导的变体 RNA,例如剪接因子。