M A Basher Abdur Rahman, Hallinan Caleb, Lee Kwonmoo
Vascular Biology Program, Boston Children's Hospital, Boston, MA, USA.
Department of Surgery, Harvard Medical School, Boston, MA, USA.
Nat Commun. 2025 Apr 16;16(1):3593. doi: 10.1038/s41467-025-58718-1.
Disease-specific subtype identification can deepen our understanding of disease progression and pave the way for personalized therapies, given the complexity of disease heterogeneity. Large-scale transcriptomic, proteomic, and imaging datasets create opportunities for discovering subtypes but also pose challenges due to their high dimensionality. To mitigate this, many feature selection methods focus on selecting features that distinguish known diseases or cell states, yet often miss features that preserve heterogeneity and reveal new subtypes. To overcome this gap, we develop Preserving Heterogeneity (PHet), a statistical methodology that employs iterative subsampling and differential analysis of interquartile range, in conjunction with Fisher's method, to identify a small set of features that enhance subtype clustering quality. Here, we show that this method can maintain sample heterogeneity while distinguishing known disease/cell states, with a tendency to outperform previous differential expression and outlier-based methods, indicating its potential to advance our understanding of disease mechanisms and cell differentiation.
鉴于疾病异质性的复杂性,特定疾病亚型的识别可以加深我们对疾病进展的理解,并为个性化治疗铺平道路。大规模的转录组学、蛋白质组学和成像数据集为发现亚型创造了机会,但由于其高维度性也带来了挑战。为了缓解这一问题,许多特征选择方法专注于选择能够区分已知疾病或细胞状态的特征,但往往会错过保留异质性并揭示新亚型的特征。为了克服这一差距,我们开发了保留异质性(PHet)方法,这是一种统计方法,它采用迭代子采样和四分位间距的差异分析,并结合费舍尔方法,来识别一小部分能够提高亚型聚类质量的特征。在这里,我们表明该方法能够在区分已知疾病/细胞状态的同时保持样本异质性,并且往往优于以前基于差异表达和异常值的方法,这表明它有潜力推进我们对疾病机制和细胞分化的理解。