Arakelyan Arsen, Sirunyan Tamara, Khachatryan Gisane, Hakobyan Siras, Minasyan Arpine, Nikoghosyan Maria, Hakobyan Meline, Chavushyan Andranik, Martirosyan Gevorg, Hakobyan Yervand, Binder Hans
Institute of Molecular Biology NAS RA, Yerevan 0014, Armenia.
Institute of Biomedicine and Pharmacy, Russian-Armenian University, Yerevan 0051, Armenia.
Cancers (Basel). 2025 Mar 13;17(6):964. doi: 10.3390/cancers17060964.
Massively parallel sequencing technologies have advanced chronic lymphocytic leukemia (CLL) diagnostics and precision oncology. Illumina platforms, while offering robust performance, require substantial infrastructure investment and a large number of samples for cost-efficiency. Conversely, third-generation long-read nanopore sequencing from Oxford Nanopore Technologies (ONT) can significantly reduce sequencing costs, making it a valuable tool in resource-limited settings. However, nanopore sequencing faces challenges with lower accuracy and throughput than Illumina platforms, necessitating additional computational strategies. In this paper, we demonstrate that integrating publicly available short-read data with in-house generated ONT data, along with the application of machine learning approaches, enables the characterization of the CLL transcriptome landscape, the identification of clinically relevant molecular subtypes, and the assignment of these subtypes to nanopore-sequenced samples. Public Illumina RNA sequencing data for 608 CLL samples were obtained from the CLL-Map Portal. CLL transcriptome analysis, gene module identification, and transcriptomic subtype classification were performed using the oposSOM R package for high-dimensional data visualization with self-organizing maps. Eight CLL patients were recruited from the Hematology Center After Prof. R. Yeolyan (Yerevan, Armenia). Sequencing libraries were prepared from blood total RNA using the PCR-cDNA sequencing-barcoding kit (SQK-PCB109) following the manufacturer's protocol and sequenced on an R9.4.1 flow cell for 24-48 h. Raw reads were converted to TPM values. These data were projected into the SOMs space using the supervised SOMs portrayal (supSOM) approach to predict the SOMs portrait of new samples using support vector machine regression. The CLL transcriptomic landscape reveals disruptions in gene modules (spots) associated with T cell cytotoxicity, B and T cell activation, inflammation, cell cycle, DNA repair, proliferation, and splicing. A specific gene module contained genes associated with poor prognosis in CLL. Accordingly, CLL samples were classified into T-cell cytotoxic, immune, proliferative, splicing, and three mixed types: proliferative-immune, proliferative-splicing, and proliferative-immune-splicing. These transcriptomic subtypes were associated with survival orthogonal to gender and mutation status. Using supervised machine learning approaches, transcriptomic subtypes were assigned to patient samples sequenced with nanopore sequencing. This study demonstrates that the CLL transcriptome landscape can be parsed into functional modules, revealing distinct molecular subtypes based on proliferative and immune activity, with important implications for prognosis and treatment that are orthogonal to other molecular classifications. Additionally, the integration of nanopore sequencing with public datasets and machine learning offers a cost-effective approach to molecular subtyping and prognostic prediction, facilitating more accessible and personalized CLL care.
大规模平行测序技术推动了慢性淋巴细胞白血病(CLL)的诊断和精准肿瘤学发展。Illumina平台虽然性能强大,但为了实现成本效益,需要大量的基础设施投资和大量样本。相反,牛津纳米孔技术公司(ONT)的第三代长读长纳米孔测序可以显著降低测序成本,使其成为资源有限环境中的一种有价值工具。然而,纳米孔测序面临着比Illumina平台更低的准确性和通量的挑战,这就需要额外的计算策略。在本文中,我们证明将公开可用的短读长数据与内部生成的ONT数据相结合,并应用机器学习方法,能够对CLL转录组图谱进行表征,识别临床相关的分子亚型,并将这些亚型分配给纳米孔测序的样本。从CLL-Map门户获得了608个CLL样本的公开Illumina RNA测序数据。使用oposSOM R包进行CLL转录组分析、基因模块识别和转录组亚型分类,以通过自组织映射进行高维数据可视化。从R. Yeolyan教授血液学中心(亚美尼亚埃里温)招募了8名CLL患者。按照制造商的方案,使用PCR-cDNA测序-条形码试剂盒(SQK-PCB109)从血液总RNA制备测序文库,并在R9.4.1流动槽上测序24 - 48小时。将原始读数转换为TPM值。使用监督自组织映射描绘(supSOM)方法将这些数据投影到自组织映射空间中,以使用支持向量机回归预测新样本的自组织映射画像。CLL转录组图谱揭示了与T细胞细胞毒性、B和T细胞激活、炎症、细胞周期、DNA修复、增殖和剪接相关的基因模块(斑点)的破坏。一个特定的基因模块包含与CLL预后不良相关的基因。因此,CLL样本被分为T细胞细胞毒性型、免疫型、增殖型、剪接型以及三种混合型:增殖-免疫型、增殖-剪接型和增殖-免疫-剪接型。这些转录组亚型与生存相关,独立于性别和突变状态。使用监督机器学习方法,将转录组亚型分配给用纳米孔测序的患者样本。这项研究表明,CLL转录组图谱可以解析为功能模块,基于增殖和免疫活性揭示不同的分子亚型,对预后和治疗具有重要意义,且独立于其他分子分类。此外,纳米孔测序与公共数据集和机器学习的整合提供了一种具有成本效益的分子亚型分类和预后预测方法,有助于实现更易获得且个性化的CLL护理。