Department of Clinical Biochemistry and Pharmacology, Ben-Gurion University of the Negev, 84105, Beer Sheva, Israel.
Department of Human Molecular Genetics and Biochemistry, Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel.
Genome Biol. 2024 Apr 15;25(1):95. doi: 10.1186/s13059-024-03225-7.
Aneuploidy, an abnormal number of chromosomes within a cell, is a hallmark of cancer. Patterns of aneuploidy differ across cancers, yet are similar in cancers affecting closely related tissues. The selection pressures underlying aneuploidy patterns are not fully understood, hindering our understanding of cancer development and progression.
Here, we apply interpretable machine learning methods to study tissue-selective aneuploidy patterns. We define 20 types of features corresponding to genomic attributes of chromosome-arms, normal tissues, primary tumors, and cancer cell lines (CCLs), and use them to model gains and losses of chromosome arms in 24 cancer types. To reveal the factors that shape the tissue-specific cancer aneuploidy landscapes, we interpret the machine learning models by estimating the relative contribution of each feature to the models. While confirming known drivers of positive selection, our quantitative analysis highlights the importance of negative selection for shaping aneuploidy landscapes. This is exemplified by tumor suppressor gene density being a better predictor of gain patterns than oncogene density, and vice versa for loss patterns. We also identify the importance of tissue-selective features and demonstrate them experimentally, revealing KLF5 as an important driver for chr13q gain in colon cancer. Further supporting an important role for negative selection in shaping the aneuploidy landscapes, we find compensation by paralogs to be among the top predictors of chromosome arm loss prevalence and demonstrate this relationship for one paralog interaction. Similar factors shape aneuploidy patterns in human CCLs, demonstrating their relevance for aneuploidy research.
Our quantitative, interpretable machine learning models improve the understanding of the genomic properties that shape cancer aneuploidy landscapes.
细胞内染色体数目异常,即非整倍体,是癌症的一个标志。不同癌症之间的非整倍体模式存在差异,但在影响密切相关组织的癌症中则相似。非整倍体模式背后的选择压力尚未完全了解,这阻碍了我们对癌症发展和进展的理解。
在这里,我们应用可解释的机器学习方法来研究组织选择性非整倍体模式。我们定义了 20 种类型的特征,分别对应于染色体臂、正常组织、原发肿瘤和癌细胞系(CCL)的基因组属性,并使用它们来模拟 24 种癌症类型中染色体臂的增益和丢失。为了揭示塑造组织特异性癌症非整倍体景观的因素,我们通过估计每个特征对模型的相对贡献来解释机器学习模型。虽然证实了阳性选择的已知驱动因素,但我们的定量分析强调了负选择对塑造非整倍体景观的重要性。这表现在肿瘤抑制基因密度是增益模式的更好预测因子,而癌基因密度则相反,损失模式也是如此。我们还确定了组织选择性特征的重要性,并通过实验证明了这一点,揭示了 KLF5 作为结肠癌中 chr13q 增益的重要驱动因素。进一步支持负选择在塑造非整倍体景观中的重要作用,我们发现同源基因的补偿是染色体臂丢失流行率的顶级预测因子之一,并为此证明了一个同源基因相互作用的关系。类似的因素塑造了人类 CCL 中的非整倍体模式,证明了它们在非整倍体研究中的相关性。
我们的定量、可解释的机器学习模型提高了对塑造癌症非整倍体景观的基因组特性的理解。