Gu Zhaohui, Hu Zunsong, Jia Zhilian, Liu Jiangyue, Mao Allen, Han Helen
City of Hope.
Res Sq. 2023 Apr 14:rs.3.rs-2798895. doi: 10.21203/rs.3.rs-2798895/v1.
B-cell acute lymphoblastic leukemia (B-ALL) consists of dozens of subtypes defined by distinct gene expression profiles (GEPs) and various genetic lesions. With the application of transcriptome sequencing (RNA-seq), multiple novel subtypes have been identified, which lead to an advanced B-ALL classification and risk-stratification system. However, the complexity of analyzing RNA-seq data for B-ALL classification hinders the implementation of the new B-ALL taxonomy. Here, we introduce MD-ALL (Molecular Diagnosis of ALL), a user-friendly platform featuring sensitive and accurate B-ALL classification based on GEPs and sentinel genetic alterations. In this study, we systematically analyzed 2,955 B-ALL RNA-seq samples and generated a reference dataset representing all the reported B-ALL subtypes. Using multiple machine learning algorithms, we identified the feature genes and then established highly accurate models for B-ALL classification using either bulk or single-cell RNA-seq data. Importantly, this platform integrates the key genetic lesions, including sequence mutations, large-scale copy number variations, and gene rearrangements, to perform comprehensive and definitive B-ALL classification. Through validation in a hold-out cohort of 974 samples, our models demonstrated superior performance for B-ALL classification compared with alternative tools. In summary, MD-ALL is a user-friendly B-ALL classification platform designed to enable integrative, accurate, and comprehensive B-ALL subtype classification.
B 细胞急性淋巴细胞白血病(B-ALL)由数十种由不同基因表达谱(GEP)和各种基因损伤定义的亚型组成。随着转录组测序(RNA-seq)的应用,已鉴定出多种新型亚型,这导致了先进的 B-ALL 分类和风险分层系统。然而,分析 B-ALL 分类的 RNA-seq 数据的复杂性阻碍了新的 B-ALL 分类法的实施。在此,我们介绍 MD-ALL(ALL 的分子诊断),这是一个用户友好的平台,基于 GEP 和哨兵基因改变进行敏感且准确的 B-ALL 分类。在本研究中,我们系统地分析了 2955 个 B-ALL RNA-seq 样本,并生成了一个代表所有已报道的 B-ALL 亚型的参考数据集。使用多种机器学习算法,我们识别出特征基因,然后使用批量或单细胞 RNA-seq 数据建立了用于 B-ALL 分类的高精度模型。重要的是,该平台整合了关键的基因损伤,包括序列突变、大规模拷贝数变异和基因重排,以进行全面且明确的 B-ALL 分类。通过在 974 个样本的验证队列中进行验证,我们的模型在 B-ALL 分类方面表现出优于其他工具的性能。总之,MD-ALL 是一个用户友好的 B-ALL 分类平台,旨在实现综合、准确和全面的 B-ALL 亚型分类。