Qiu Yushan, Jiang Hao, Ching Wai-Ki
IEEE/ACM Trans Comput Biol Bioinform. 2021 Nov-Dec;18(6):2714-2723. doi: 10.1109/TCBB.2020.2992605. Epub 2021 Dec 8.
Clustering tumor metastasis samples from gene expression data at the whole genome level remains an arduous challenge, in particular, when the number of experimental samples is small and the number of genes is huge. We focus on the prediction of the epithelial-mesenchymal transition (EMT), which is an underlying mechanism of tumor metastasis, here, rather than tumor metastasis itself, to avoid confounding effects of uncertainties derived from various factors. In this paper, we propose a novel model in predicting EMT based on multidimensional scaling (MDS) strategies and integrating entropy and random matrix detection strategies to determine the optimal reduced number of dimension in low dimensional space. We verified our proposed model with the gene expression data for EMT samples of breast cancer and the experimental results demonstrated the superiority over state-of-the-art clustering methods. Furthermore, we developed a novel feature extraction method for selecting the significant genes and predicting the tumor metastasis. The source code is available at "https://github.com/yushanqiu/yushan.qiu-szu.edu.cn".
从全基因组水平的基因表达数据中对肿瘤转移样本进行聚类仍然是一项艰巨的挑战,特别是当实验样本数量较少而基因数量巨大时。我们在此关注上皮-间质转化(EMT)的预测,它是肿瘤转移的一种潜在机制,而非肿瘤转移本身,以避免各种因素产生的不确定性带来的混杂效应。在本文中,我们提出了一种基于多维缩放(MDS)策略并整合熵和随机矩阵检测策略来确定低维空间中最优降维数量的新型EMT预测模型。我们用乳腺癌EMT样本的基因表达数据验证了我们提出的模型,实验结果证明了其优于现有聚类方法。此外,我们开发了一种用于选择显著基因和预测肿瘤转移的新型特征提取方法。源代码可在“https://github.com/yushanqiu/yushan.qiu-szu.edu.cn”获取。