Sehhati Mohammadreza, Tabatabaiefar Mohammad Amin, Gholami Ali Haji, Sattari Mohammad
Medical Image and Signal Processing Research Center, Department of Bioinformatics,School of Advanced Technologies in Medicine, Isfahan University of Medical Sciences, Isfahan, Iran.
Department of Genetics and Molecular Biology, School of Medicine, Isfahan University of Medical Sciences, Isfahan, Iran.
J Med Signals Sens. 2022 May 12;12(2):122-126. doi: 10.4103/jmss.jmss_117_21. eCollection 2022 Apr-Jun.
Breast cancer is a type of cancer that starts in the breast tissue and affects about 10% of women at different stages of their lives. In this study, we applied a new method to predict recurrence in biological networks made from gene expression data.
The method includes the steps such as data collection, clustering, determining differentiating genes, and classification. The eight techniques consist of random forest, support vector machine and neural network, randomforest + k-means, hidden markov model, joint mutual information, neural network + k-means and suportvector machine + k-menas were implemented on 12172 genes and 200 samples.
Thirty genes were considered as differentiating genes which used for the classification. The results showed that random forest + k-means get better performance than other techniques. The two techniques including neural network + k-means and random forest + k-means performed better than other techniques in identifying high risk cases.
Thirty of 12,172 genes are considered for classification that the use of clustering has improved the classification techniques performance.
乳腺癌是一种起源于乳腺组织的癌症,在女性生命的不同阶段影响着约10%的女性。在本研究中,我们应用了一种新方法来预测由基因表达数据构建的生物网络中的复发情况。
该方法包括数据收集、聚类、确定差异基因和分类等步骤。在12172个基因和200个样本上实施了随机森林、支持向量机和神经网络、随机森林+k均值、隐马尔可夫模型、联合互信息、神经网络+k均值和支持向量机+k均值这八种技术。
30个基因被视为用于分类的差异基因。结果表明,随机森林+k均值比其他技术表现更好。包括神经网络+k均值和随机森林+k均值在内的两种技术在识别高风险病例方面比其他技术表现更好。
在12172个基因中有30个被用于分类,聚类的使用提高了分类技术的性能。