Salim A, Amjesh R, Chandra S S Vinod
Department of Computer Science, College of Engineering Trivandrum, Sreekaryam, Thiruvananthapuram, India.
Department of Computational Biology and BioInformatics, University of Kerala, Karyavattom, Thiruvananthapuram, India.
BMC Cancer. 2017 Jan 25;17(1):77. doi: 10.1186/s12885-016-3042-2.
microRNAs are single-stranded non-coding RNA sequences of 18 - 24 nucleotides in length. They play an important role in post-transcriptional regulation of gene expression. Evidences of microRNA acting as promoter/suppressor of several diseases including cancer are being unveiled. Recent studies have shown that microRNAs are differentially expressed in disease states when compared with that of normal states. Profiling of microRNA is a good measure to estimate the differences in expression levels, which can be further utilized to understand the progression of any associated disease.
Machine learning techniques, when applied to microRNA expression values obtained from NGS data, could be utilized for the development of effective disease prediction system. This paper discusses an approach for microRNA expression profiling, its normalization and a Support Vector based machine learning technique to develop a Cancer Prediction System. Presently, the system has been trained with data samples of hepatocellular carcinoma, carcinomas of the bladder and lung cancer. microRNAs related to specific types of cancer were used to build the classifier.
When the system is trained and tested with 10 fold cross validation, the prediction accuracy obtained is 97.56% for lung cancer, 97.82% for hepatocellular carcinoma and 95.0% for carcinomas of the bladder. The system is further validated with separate test sets, which show accuracies higher than 90%. A ranking based on differential expression marks the relative significance of each microRNA in the prediction process.
Results from experiments proved that microRNA expression profiling is an effective mechanism for disease identification, provided sufficiently large database is available.
微小RNA是长度为18 - 24个核苷酸的单链非编码RNA序列。它们在基因表达的转录后调控中发挥着重要作用。微小RNA作为包括癌症在内的多种疾病的启动子/抑制因子的证据正在被揭示。最近的研究表明,与正常状态相比,微小RNA在疾病状态下存在差异表达。微小RNA谱分析是估计表达水平差异的一种好方法,可进一步用于了解任何相关疾病的进展。
将机器学习技术应用于从NGS数据获得的微小RNA表达值,可用于开发有效的疾病预测系统。本文讨论了一种微小RNA表达谱分析方法、其标准化以及一种基于支持向量的机器学习技术来开发癌症预测系统。目前,该系统已使用肝细胞癌、膀胱癌和肺癌的数据样本进行了训练。与特定类型癌症相关的微小RNA被用于构建分类器。
当使用10折交叉验证对系统进行训练和测试时,肺癌的预测准确率为97.56%,肝细胞癌为97.82%,膀胱癌为95.0%。该系统用单独的测试集进一步验证,其准确率高于90%。基于差异表达的排名标志着每个微小RNA在预测过程中的相对重要性。
实验结果证明,微小RNA表达谱分析是一种有效的疾病识别机制,前提是有足够大的数据库。