Aslam Jai, Ardanza-Trevijano Sergio, Xiong Jingwei, Arsuaga Javier, Sazdanovic Radmila
Department of Mathematics, NC State University, Raleigh, NC 27695, USA.
Department of Physics and Applied Mathematics, University of Navarra, 31008 Pamplona, Spain.
Entropy (Basel). 2022 Jun 29;24(7):896. doi: 10.3390/e24070896.
Copy number changes play an important role in the development of cancer and are commonly associated with changes in gene expression. Persistence curves, such as Betti curves, have been used to detect copy number changes; however, it is known these curves are unstable with respect to small perturbations in the data. We address the stability of lifespan and Betti curves by providing bounds on the distance between persistence curves of Vietoris-Rips filtrations built on data and slightly perturbed data in terms of the bottleneck distance. Next, we perform simulations to compare the predictive ability of Betti curves, lifespan curves (conditionally stable) and stable persistent landscapes to detect copy number aberrations. We use these methods to identify significant chromosome regions associated with the four major molecular subtypes of breast cancer: Luminal A, Luminal B, Basal and HER2 positive. Identified segments are then used as predictor variables to build machine learning models which classify patients as one of the four subtypes. We find that no single persistence curve outperforms the others and instead suggest a complementary approach using a suite of persistence curves. In this study, we identified new cytobands associated with three of the subtypes: 1q21.1-q25.2, 2p23.2-p16.3, 23q26.2-q28 with the Basal subtype, 8p22-p11.1 with Luminal B and 2q12.1-q21.1 and 5p14.3-p12 with Luminal A. These segments are validated by the TCGA BRCA cohort dataset except for those found for Luminal A.
拷贝数变化在癌症发展中起着重要作用,并且通常与基因表达变化相关。诸如贝蒂曲线之类的持久曲线已被用于检测拷贝数变化;然而,已知这些曲线对于数据中的小扰动是不稳定的。我们通过根据瓶颈距离给出基于数据构建的Vietoris-Rips过滤的持久曲线与轻微扰动数据的持久曲线之间的距离界限,来解决寿命曲线和贝蒂曲线的稳定性问题。接下来,我们进行模拟,以比较贝蒂曲线、寿命曲线(条件稳定)和稳定持久景观检测拷贝数畸变的预测能力。我们使用这些方法来识别与乳腺癌的四种主要分子亚型相关的重要染色体区域:管腔A型、管腔B型、基底型和HER2阳性型。然后将识别出的片段用作预测变量来构建机器学习模型,将患者分类为四种亚型之一。我们发现没有单一的持久曲线比其他曲线表现更好,相反,我们建议使用一组持久曲线的互补方法。在本研究中,我们识别出了与其中三种亚型相关的新细胞带:与基底型相关的1q21.1-q25.2、2p23.2-p16.3、23q26.2-q28,与管腔B型相关的8p22-p11.1,以及与管腔A型相关的2q12.1-q21.1和5p14.3-p12。除了管腔A型的那些发现外,这些片段均通过TCGA BRCA队列数据集得到验证。