Suppr超能文献

一种基于马尔可夫链的用于癌症DNA序列分类和识别的特征提取方法。

A Markov chain-based feature extraction method for classification and identification of cancerous DNA sequences.

作者信息

Khodaei Amin, Feizi-Derakhshi Mohammad-Reza, Mozaffari-Tazehkand Behzad

机构信息

Faculty of Electrical and Computer Engineering, University of Tabriz, Tabriz, Iran.

出版信息

Bioimpacts. 2021;11(2):87-99. doi: 10.34172/bi.2021.16. Epub 2020 Mar 24.

Abstract

In recent decades, the growing rate of cancer incidence is a big concern for most societies. Due to the genetic origins of cancer disease, its internal structure is necessary for the study of this disease. In this research, cancer data are analyzed based on DNA sequences. The transition probability of occurring two pairs of nucleotides in DNA sequences has Markovian property. This property inspires the idea of feature dimension reduction of DNA sequence for overcoming the high computational overhead of genes analysis. This idea is utilized in this research based on the Markovian property of DNA sequences. This mapping decreases feature dimensions and conserves basic properties for discrimination of cancerous and non-cancerous genes. The results showed that a non-linear support vector machine (SVM) classifier with RBF and polynomial kernel functions can discriminate selected cancerous samples from non-cancerous ones. Experimental results based on the 10-fold cross-validation and accuracy metrics verified that the proposed method has low computational overhead and high accuracy. The proposed algorithm was successfully tested on related research case studies. In general, a combination of proposed Markovian-based feature reduction and non-linear SVM classifier can be considered as one of the best methods for discrimination of cancerous and non-cancerous genes.

摘要

近几十年来,癌症发病率的增长速度是大多数社会极为关注的问题。由于癌症疾病的遗传起源,其内部结构对于研究这种疾病至关重要。在本研究中,基于DNA序列对癌症数据进行分析。DNA序列中出现两对核苷酸的转移概率具有马尔可夫性质。这一性质激发了对DNA序列进行特征降维的想法,以克服基因分析中高计算开销的问题。基于DNA序列的马尔可夫性质,这一想法在本研究中得到应用。这种映射降低了特征维度,并保留了区分癌基因和非癌基因的基本属性。结果表明,具有径向基函数(RBF)和多项式核函数的非线性支持向量机(SVM)分类器能够区分选定的癌样本和非癌样本。基于10折交叉验证和准确率指标的实验结果证实,所提出的方法具有低计算开销和高精度。所提出的算法在相关研究案例中得到了成功测试。总体而言,所提出的基于马尔可夫的特征约简和非线性SVM分类器的组合可被视为区分癌基因和非癌基因的最佳方法之一。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0d54/8022238/6548668bcecb/bi-11-87-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验