Department of Computer Engineering and Information Technology, ABES Engineering College, Ghaziabad, Uttar Pradesh, India.
Department of CS&IT, Jaypee Institute of Information Technology, Noida, Uttar Pradesh, India.
J Integr Bioinform. 2020 Dec 29;18(2):139-153. doi: 10.1515/jib-2019-0110.
Breast cancer is the leading diseases of death in women. It induces by a genetic mutation in breast cancer cells. Genetic testing has become popular to detect the mutation in genes but test cost is relatively expensive for several patients in developing countries like India. Genetic test takes between 2 and 4 weeks to decide the cancer. The time duration suffers the prognosis of genes because some patients have high rate of cancerous cell growth. In the research work, a cost and time efficient method is proposed to predict the gene expression level on the basis of clinical outcomes of the patient by using machine learning techniques. An improved SVM-RFE_MI gene selection technique is proposed to find the most significant genes related to breast cancer afterward explained variance statistical analysis is applied to extract the genes contain high variance. Least Absolute Shrinkage Selector Operator (LASSO) and Ridge regression techniques are used to predict the gene expression level. The proposed method predicts the expression of significant genes with reduced Root Mean Square Error and acceptable adjusted R-square value. As per the study, analysis of these selected genes is beneficial to diagnose the breast cancer at prior stage in reduced cost and time.
乳腺癌是女性死亡的主要疾病。它是由乳腺癌细胞中的基因突变引起的。基因检测已成为检测基因突变的一种流行方法,但对于印度等发展中国家的一些患者来说,检测成本相对较高。基因检测需要 2 到 4 周的时间来确定癌症。这段时间会影响基因的预后,因为有些患者的癌细胞生长速度很快。在研究工作中,提出了一种基于机器学习技术的基于患者临床结果的成本和时间有效的方法来预测基因表达水平。提出了一种改进的 SVM-RFE_MI 基因选择技术来找到与乳腺癌最相关的最重要基因,然后应用方差解释统计分析来提取包含高方差的基因。最小绝对收缩选择算子 (LASSO) 和 Ridge 回归技术用于预测基因表达水平。该方法通过降低均方根误差和可接受的调整 R 平方值来预测显著基因的表达。根据这项研究,分析这些选定的基因有助于在降低成本和时间的前提下,更早地诊断乳腺癌。