Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia; Center of Excellence for Smart Health (KCSH), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia; Center of Excellence on Generative AI, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia.
Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia; Center of Excellence for Smart Health (KCSH), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia; Center of Excellence on Generative AI, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia.
Cell Genom. 2024 Sep 11;4(9):100641. doi: 10.1016/j.xgen.2024.100641. Epub 2024 Aug 30.
Colorectal cancer (CRC) ranks as the second leading cause of cancer deaths globally. In recent years, short-read single-cell RNA sequencing (scRNA-seq) has been instrumental in deciphering tumor heterogeneities. However, these studies only enable gene-level quantification but neglect alterations in transcript structures arising from alternative end processing or splicing. In this study, we integrated short- and long-read scRNA-seq of CRC samples to build an isoform-resolution CRC transcriptomic atlas. We identified 394 dysregulated transcript structures in tumor epithelial cells, including 299 resulting from various combinations of splicing events. Second, we characterized genes and isoforms associated with epithelial lineages and subpopulations exhibiting distinct prognoses. Among 31,935 isoforms with novel junctions, 330 were supported by The Cancer Genome Atlas RNA-seq and mass spectrometry data. Finally, we built an algorithm that integrated novel peptides derived from open reading frames of recurrent tumor-specific transcripts with mass spectrometry data and identified recurring neoepitopes that may aid the development of cancer vaccines.
结直肠癌(CRC)是全球癌症死亡的第二大主要原因。近年来,短读长单细胞 RNA 测序(scRNA-seq)在破译肿瘤异质性方面发挥了重要作用。然而,这些研究仅能够实现基因水平的定量,但忽略了由可变末端加工或剪接引起的转录本结构的改变。在这项研究中,我们整合了 CRC 样本的短读长和长读长 scRNA-seq,构建了一个异构体分辨率的 CRC 转录组图谱。我们在肿瘤上皮细胞中鉴定出 394 个失调的转录本结构,其中 299 个是由各种剪接事件组合产生的。其次,我们对与上皮谱系和具有不同预后的亚群相关的基因和异构体进行了特征描述。在具有新型连接的 31935 个异构体中,有 330 个得到了癌症基因组图谱 RNA-seq 和质谱数据的支持。最后,我们构建了一种算法,该算法将来自复发性肿瘤特异性转录本开放阅读框的新型肽与质谱数据相结合,并鉴定出可能有助于癌症疫苗开发的反复出现的新抗原。