Suppr超能文献

使用基于MapReduce的高效K近邻分类器分析微阵列白血病数据。

Analysis of microarray leukemia data using an efficient MapReduce-based K-nearest-neighbor classifier.

作者信息

Kumar Mukesh, Rath Nitish Kumar, Rath Santanu Kumar

机构信息

Department of Computer Science and Engineering, NIT Rourkela, Orissa 769008, India.

出版信息

J Biomed Inform. 2016 Apr;60:395-409. doi: 10.1016/j.jbi.2016.03.002. Epub 2016 Mar 11.

Abstract

Microarray-based gene expression profiling has emerged as an efficient technique for classification, prognosis, diagnosis, and treatment of cancer. Frequent changes in the behavior of this disease generates an enormous volume of data. Microarray data satisfies both the veracity and velocity properties of big data, as it keeps changing with time. Therefore, the analysis of microarray datasets in a small amount of time is essential. They often contain a large amount of expression, but only a fraction of it comprises genes that are significantly expressed. The precise identification of genes of interest that are responsible for causing cancer are imperative in microarray data analysis. Most existing schemes employ a two-phase process such as feature selection/extraction followed by classification. In this paper, various statistical methods (tests) based on MapReduce are proposed for selecting relevant features. After feature selection, a MapReduce-based K-nearest neighbor (mrKNN) classifier is also employed to classify microarray data. These algorithms are successfully implemented in a Hadoop framework. A comparative analysis is done on these MapReduce-based models using microarray datasets of various dimensions. From the obtained results, it is observed that these models consume much less execution time than conventional models in processing big data.

摘要

基于微阵列的基因表达谱分析已成为一种用于癌症分类、预后评估、诊断和治疗的有效技术。这种疾病行为的频繁变化产生了大量数据。微阵列数据满足大数据的真实性和速度特性,因为它会随时间不断变化。因此,在短时间内分析微阵列数据集至关重要。它们通常包含大量表达数据,但其中只有一小部分包含显著表达的基因。在微阵列数据分析中,准确识别导致癌症的感兴趣基因至关重要。大多数现有方案采用两阶段过程,如特征选择/提取,然后进行分类。本文提出了基于MapReduce的各种统计方法(测试)来选择相关特征。在特征选择之后,还采用基于MapReduce的K近邻(mrKNN)分类器对微阵列数据进行分类。这些算法在Hadoop框架中成功实现。使用各种维度的微阵列数据集对这些基于MapReduce的模型进行了比较分析。从获得的结果可以看出,在处理大数据时,这些模型比传统模型消耗的执行时间要少得多。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验