Thrun Michael C, Mack Elisabeth K M, Neubauer Andreas, Haferlach Torsten, Frech Miriam, Ultsch Alfred, Brendel Cornelia
Department of Mathematics and Computer Science, Philipps-University Marburg, Hans-Meerwein-Straße, 35032 Marburg, Germany.
Department of Hematology, Oncology and Immunology, Philipps-University Marburg, 35043 Marburg, Germany.
Bioengineering (Basel). 2022 Nov 3;9(11):642. doi: 10.3390/bioengineering9110642.
"Big omics data" provoke the challenge of extracting meaningful information with clinical benefit. Here, we propose a two-step approach, an initial unsupervised inspection of the structure of the high dimensional data followed by supervised analysis of gene expression levels, to reconstruct the surface patterns on different subtypes of acute myeloid leukemia (AML). First, Bayesian methodology was used, focusing on surface molecules encoded by cluster of differentiation (CD) genes to assess whether AML is a homogeneous group or segregates into clusters. Gene expressions of 390 patient samples measured using microarray technology and 150 samples measured via RNA-Seq were compared. Beyond acute promyelocytic leukemia (APL), a well-known AML subentity, the remaining AML samples were separated into two distinct subgroups. Next, we investigated which CD molecules would best distinguish each AML subgroup against APL, and validated discriminative molecules of both datasets by searching the scientific literature. Surprisingly, a comparison of both omics analyses revealed that CD339 was the only overlapping gene differentially regulated in APL and other AML subtypes. In summary, our two-step approach for gene expression analysis revealed two previously unknown subgroup distinctions in AML based on surface molecule expression, which may guide the differentiation of subentities in a given clinical-diagnostic context.
“大组学数据”引发了提取具有临床益处的有意义信息的挑战。在此,我们提出一种两步法,即首先对高维数据结构进行无监督检查,随后对基因表达水平进行有监督分析,以重建急性髓系白血病(AML)不同亚型的表面模式。首先,使用贝叶斯方法,重点关注由分化簇(CD)基因编码的表面分子,以评估AML是一个同质群体还是可分为不同簇。比较了使用微阵列技术测量的390例患者样本和通过RNA测序测量的150例样本的基因表达。除了急性早幼粒细胞白血病(APL)这一众所周知的AML亚实体外,其余AML样本被分为两个不同的亚组。接下来,我们研究了哪些CD分子能最好地区分每个AML亚组与APL,并通过查阅科学文献验证了两个数据集的鉴别分子。令人惊讶的是,两种组学分析的比较表明,CD339是在APL和其他AML亚型中差异调节的唯一重叠基因。总之,我们的基因表达分析两步法揭示了基于表面分子表达的AML中两个先前未知的亚组差异,这可能在给定的临床诊断背景下指导亚实体的区分。