Park Heewon, Miyano Satoru
School of Mathematics, Statistics and Data Science, Sungshin Women's University, Seoul, Republic of Korea.
M&D Data Science Center, Tokyo Medical and Dental University, Tokyo, Japan.
J Comput Biol. 2024 Nov;31(11):1158-1178. doi: 10.1089/cmb.2024.0539. Epub 2024 Sep 6.
We focus on characterizing cell lines from young and aged-healthy and -AML (acute myeloid leukemia) cell lines, and our goal is to identify the key markers associated with the progression of AML. To characterize the age-related phenotypes in AML cell lines, we consider eigenCell analysis that effectively encapsulates the primary expression level patterns across the cell lines. However, earlier investigations utilizing eigenGenes and eigenCells analysis were based on linear combination of all features, leading to the disturbance from noise features. Moreover, the analysis based on a fully dense loading matrix makes it challenging to interpret the results of eigenCells analysis. In order to address these challenges, we develop a novel computational approach termed network-constrained eigenCells profile estimation, which employs a sparse learning strategy. The proposed method estimates eigenCell based on not only the lasso but also network constrained penalization. The use of the network-constrained penalization enables us to simultaneously select neighborhood genes. Furthermore, the hub genes and their regulator/target genes are easily selected as crucial markers for eigenCells estimation. That is, our method can incorporate insights from network biology into the process of sparse loading estimation. Through our methodology, we estimate sparse eigenCells profiles, where only critical markers exhibit expression levels. This allows us to identify the key markers associated with a specific phenotype. Monte Carlo simulations demonstrate the efficacy of our method in reconstructing the sparse structure of eigenCells profiles. We employed our approach to unveil the regulatory system of immunogenes in both young/aged-healthy and -AML cell lines. The markers we have identified for the age-related phenotype in both healthy and AML cell lines have garnered strong support from previous studies. Specifically, our findings, in conjunction with the existing literature, indicate that the activities within this subnetwork of CD79A could be pivotal in elucidating the mechanism driving AML progression, particularly noting the significant role played by the diminished activities in the CD79A subnetwork. We expect that the proposed method will be a useful tool for characterizing disease-related subsets of cell lines, encompassing phenotypes and clones.
我们专注于对年轻和老年健康以及急性髓系白血病(AML)细胞系进行特征描述,我们的目标是识别与AML进展相关的关键标志物。为了表征AML细胞系中与年龄相关的表型,我们考虑特征细胞分析,它能有效概括各细胞系的主要表达水平模式。然而,早期利用特征基因和特征细胞分析的研究基于所有特征的线性组合,导致受到噪声特征的干扰。此外,基于完全密集加载矩阵的分析使得解释特征细胞分析的结果具有挑战性。为了应对这些挑战,我们开发了一种名为网络约束特征细胞轮廓估计的新型计算方法,该方法采用稀疏学习策略。所提出的方法不仅基于套索估计特征细胞,还基于网络约束惩罚。网络约束惩罚的使用使我们能够同时选择邻域基因。此外,中心基因及其调节/靶基因很容易被选为特征细胞估计的关键标志物。也就是说,我们的方法可以将网络生物学的见解纳入稀疏加载估计过程。通过我们的方法,我们估计稀疏的特征细胞轮廓,其中只有关键标志物呈现表达水平。这使我们能够识别与特定表型相关的关键标志物。蒙特卡罗模拟证明了我们的方法在重建特征细胞轮廓的稀疏结构方面的有效性。我们采用我们的方法揭示年轻/老年健康和AML细胞系中免疫基因的调节系统。我们在健康和AML细胞系中为与年龄相关的表型所鉴定的标志物得到了先前研究的有力支持。具体而言,我们的发现与现有文献相结合表明,CD79A这个子网络内的活动可能在阐明驱动AML进展的机制中起关键作用,特别要注意CD79A子网络中活动减少所起的重要作用。我们期望所提出的方法将成为表征细胞系疾病相关亚群(包括表型和克隆)的有用工具。