• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用基于MapReduce的高效K近邻分类器分析微阵列白血病数据。

Analysis of microarray leukemia data using an efficient MapReduce-based K-nearest-neighbor classifier.

作者信息

Kumar Mukesh, Rath Nitish Kumar, Rath Santanu Kumar

机构信息

Department of Computer Science and Engineering, NIT Rourkela, Orissa 769008, India.

出版信息

J Biomed Inform. 2016 Apr;60:395-409. doi: 10.1016/j.jbi.2016.03.002. Epub 2016 Mar 11.

DOI:10.1016/j.jbi.2016.03.002
PMID:26975600
Abstract

Microarray-based gene expression profiling has emerged as an efficient technique for classification, prognosis, diagnosis, and treatment of cancer. Frequent changes in the behavior of this disease generates an enormous volume of data. Microarray data satisfies both the veracity and velocity properties of big data, as it keeps changing with time. Therefore, the analysis of microarray datasets in a small amount of time is essential. They often contain a large amount of expression, but only a fraction of it comprises genes that are significantly expressed. The precise identification of genes of interest that are responsible for causing cancer are imperative in microarray data analysis. Most existing schemes employ a two-phase process such as feature selection/extraction followed by classification. In this paper, various statistical methods (tests) based on MapReduce are proposed for selecting relevant features. After feature selection, a MapReduce-based K-nearest neighbor (mrKNN) classifier is also employed to classify microarray data. These algorithms are successfully implemented in a Hadoop framework. A comparative analysis is done on these MapReduce-based models using microarray datasets of various dimensions. From the obtained results, it is observed that these models consume much less execution time than conventional models in processing big data.

摘要

基于微阵列的基因表达谱分析已成为一种用于癌症分类、预后评估、诊断和治疗的有效技术。这种疾病行为的频繁变化产生了大量数据。微阵列数据满足大数据的真实性和速度特性,因为它会随时间不断变化。因此,在短时间内分析微阵列数据集至关重要。它们通常包含大量表达数据,但其中只有一小部分包含显著表达的基因。在微阵列数据分析中,准确识别导致癌症的感兴趣基因至关重要。大多数现有方案采用两阶段过程,如特征选择/提取,然后进行分类。本文提出了基于MapReduce的各种统计方法(测试)来选择相关特征。在特征选择之后,还采用基于MapReduce的K近邻(mrKNN)分类器对微阵列数据进行分类。这些算法在Hadoop框架中成功实现。使用各种维度的微阵列数据集对这些基于MapReduce的模型进行了比较分析。从获得的结果可以看出,在处理大数据时,这些模型比传统模型消耗的执行时间要少得多。

相似文献

1
Analysis of microarray leukemia data using an efficient MapReduce-based K-nearest-neighbor classifier.使用基于MapReduce的高效K近邻分类器分析微阵列白血病数据。
J Biomed Inform. 2016 Apr;60:395-409. doi: 10.1016/j.jbi.2016.03.002. Epub 2016 Mar 11.
2
Chaotic genetic algorithm for gene selection and classification problems.用于基因选择与分类问题的混沌遗传算法。
OMICS. 2009 Oct;13(5):407-20. doi: 10.1089/omi.2009.0007.
3
Selecting subsets of newly extracted features from PCA and PLS in microarray data analysis.在微阵列数据分析中从主成分分析(PCA)和偏最小二乘法(PLS)中选择新提取特征的子集。
BMC Genomics. 2008 Sep 16;9 Suppl 2(Suppl 2):S24. doi: 10.1186/1471-2164-9-S2-S24.
4
Feature selection and nearest centroid classification for protein mass spectrometry.蛋白质质谱的特征选择与最近质心分类
BMC Bioinformatics. 2005 Mar 23;6:68. doi: 10.1186/1471-2105-6-68.
5
Quadratic regression analysis for gene discovery and pattern recognition for non-cyclic short time-course microarray experiments.用于非循环短时间进程微阵列实验的基因发现和模式识别的二次回归分析。
BMC Bioinformatics. 2005 Apr 25;6:106. doi: 10.1186/1471-2105-6-106.
6
Hierarchical gene selection and genetic fuzzy system for cancer microarray data classification.用于癌症微阵列数据分类的分层基因选择与遗传模糊系统
PLoS One. 2015 Mar 30;10(3):e0120364. doi: 10.1371/journal.pone.0120364. eCollection 2015.
7
Classification of microarray data with factor mixture models.基于因子混合模型的微阵列数据分类
Bioinformatics. 2006 Jan 15;22(2):202-8. doi: 10.1093/bioinformatics/bti779. Epub 2005 Nov 15.
8
Interpretable gene expression classifier with an accurate and compact fuzzy rule base for microarray data analysis.用于微阵列数据分析的具有准确且紧凑模糊规则库的可解释基因表达分类器。
Biosystems. 2006 Sep;85(3):165-76. doi: 10.1016/j.biosystems.2006.01.002. Epub 2006 Feb 21.
9
A hybrid BPSO-CGA approach for gene selection and classification of microarray data.一种用于基因选择和微阵列数据分类的混合BPSO-CGA方法。
J Comput Biol. 2012 Jan;19(1):68-82. doi: 10.1089/cmb.2010.0064. Epub 2011 Jan 6.
10
A hybrid feature selection method for DNA microarray data.一种用于 DNA 微阵列数据的混合特征选择方法。
Comput Biol Med. 2011 Apr;41(4):228-37. doi: 10.1016/j.compbiomed.2011.02.004. Epub 2011 Mar 3.

引用本文的文献

1
A Dual Level Analysis with Evolutionary Computing and Swarm Models for Classification of Leukemia.基于进化计算和群集模型的白血病分类双层分析。
Biomed Res Int. 2022 May 26;2022:2052061. doi: 10.1155/2022/2052061. eCollection 2022.
2
Inference of Large-scale Time-delayed Gene Regulatory Network with Parallel MapReduce Cloud Platform.基于并行 MapReduce 云平台的大规模时滞基因调控网络推断。
Sci Rep. 2018 Dec 12;8(1):17787. doi: 10.1038/s41598-018-36180-y.