• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用Hadoop进行微阵列数据并行预处理的框架

Framework for Parallel Preprocessing of Microarray Data Using Hadoop.

作者信息

Sahlabadi Amirhossein, Chandren Muniyandi Ravie, Sahlabadi Mahdi, Golshanbafghy Hossein

机构信息

Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, 43600 Bangi, Malaysia.

Faculty of Creative Multimedia, Multimedia University, 63100 Cyberjaya, Selangor, Malaysia.

出版信息

Adv Bioinformatics. 2018 Mar 29;2018:9391635. doi: 10.1155/2018/9391635. eCollection 2018.

DOI:10.1155/2018/9391635
PMID:29796018
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5896349/
Abstract

Nowadays, microarray technology has become one of the popular ways to study gene expression and diagnosis of disease. National Center for Biology Information (NCBI) hosts public databases containing large volumes of biological data required to be preprocessed, since they carry high levels of noise and bias. Robust Multiarray Average (RMA) is one of the standard and popular methods that is utilized to preprocess the data and remove the noises. Most of the preprocessing algorithms are time-consuming and not able to handle a large number of datasets with thousands of experiments. Parallel processing can be used to address the above-mentioned issues. Hadoop is a well-known and ideal distributed file system framework that provides a parallel environment to run the experiment. In this research, for the first time, the capability of Hadoop and statistical power of R have been leveraged to parallelize the available preprocessing algorithm called RMA to efficiently process microarray data. The experiment has been run on cluster containing 5 nodes, while each node has 16 cores and 16 GB memory. It compares efficiency and the performance of parallelized RMA using Hadoop with parallelized RMA using affyPara package as well as sequential RMA. The result shows the speed-up rate of the proposed approach outperforms the sequential approach and affyPara approach.

摘要

如今,微阵列技术已成为研究基因表达和疾病诊断的常用方法之一。美国国家生物技术信息中心(NCBI)托管着包含大量需要预处理的生物数据的公共数据库,因为这些数据存在高水平的噪声和偏差。稳健多阵列平均法(RMA)是用于预处理数据和去除噪声的标准且常用的方法之一。大多数预处理算法耗时且无法处理包含数千个实验的大量数据集。并行处理可用于解决上述问题。Hadoop是一个著名且理想的分布式文件系统框架,它提供了一个运行实验的并行环境。在本研究中,首次利用Hadoop的能力和R的统计能力将名为RMA的可用预处理算法并行化,以高效处理微阵列数据。实验在一个包含5个节点的集群上运行,每个节点有16个核心和16GB内存。它比较了使用Hadoop并行化RMA与使用affyPara包并行化RMA以及顺序RMA的效率和性能。结果表明,所提方法的加速率优于顺序方法和affyPara方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51c8/5896349/c1a44f9fa333/ABI2018-9391635.008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51c8/5896349/966c12963809/ABI2018-9391635.001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51c8/5896349/41ec12ff513f/ABI2018-9391635.002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51c8/5896349/747df758b895/ABI2018-9391635.003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51c8/5896349/fc51d74e30bc/ABI2018-9391635.004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51c8/5896349/dbabe546aba9/ABI2018-9391635.005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51c8/5896349/2224f5893049/ABI2018-9391635.006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51c8/5896349/d74089f66160/ABI2018-9391635.007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51c8/5896349/c1a44f9fa333/ABI2018-9391635.008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51c8/5896349/966c12963809/ABI2018-9391635.001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51c8/5896349/41ec12ff513f/ABI2018-9391635.002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51c8/5896349/747df758b895/ABI2018-9391635.003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51c8/5896349/fc51d74e30bc/ABI2018-9391635.004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51c8/5896349/dbabe546aba9/ABI2018-9391635.005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51c8/5896349/2224f5893049/ABI2018-9391635.006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51c8/5896349/d74089f66160/ABI2018-9391635.007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51c8/5896349/c1a44f9fa333/ABI2018-9391635.008.jpg

相似文献

1
Framework for Parallel Preprocessing of Microarray Data Using Hadoop.使用Hadoop进行微阵列数据并行预处理的框架
Adv Bioinformatics. 2018 Mar 29;2018:9391635. doi: 10.1155/2018/9391635. eCollection 2018.
2
affyPara-a Bioconductor Package for Parallelized Preprocessing Algorithms of Affymetrix Microarray Data.affyPara - 用于Affymetrix微阵列数据并行预处理算法的Bioconductor软件包。
Bioinform Biol Insights. 2009 Jul 22;3:83-7. doi: 10.4137/bbi.s3060.
3
Frozen robust multiarray analysis (fRMA).冻融稳健多阵列分析(fRMA)。
Biostatistics. 2010 Apr;11(2):242-53. doi: 10.1093/biostatistics/kxp059. Epub 2010 Jan 22.
4
A distributed data processing scheme based on Hadoop for synchrotron radiation experiments.一种基于Hadoop的用于同步辐射实验的分布式数据处理方案。
J Synchrotron Radiat. 2024 May 1;31(Pt 3):635-645. doi: 10.1107/S1600577524002637. Epub 2024 Apr 24.
5
Design and development of a medical big data processing system based on Hadoop.基于Hadoop的医学大数据处理系统的设计与开发。
J Med Syst. 2015 Mar;39(3):23. doi: 10.1007/s10916-015-0220-8. Epub 2015 Feb 10.
6
A quantitative assessment of the Hadoop framework for analyzing massively parallel DNA sequencing data.用于分析大规模并行DNA测序数据的Hadoop框架的定量评估。
Gigascience. 2015 Jun 4;4:26. doi: 10.1186/s13742-015-0058-5. eCollection 2015.
7
Hadoop neural network for parallel and distributed feature selection.用于并行和分布式特征选择的 Hadoop 神经网络。
Neural Netw. 2016 Jun;78:24-35. doi: 10.1016/j.neunet.2015.08.011. Epub 2015 Sep 5.
8
Unstructured medical image query using big data - An epilepsy case study.使用大数据的非结构化医学图像查询——一个癫痫病例研究
J Biomed Inform. 2016 Feb;59:218-26. doi: 10.1016/j.jbi.2015.12.005. Epub 2015 Dec 17.
9
Using Hadoop MapReduce for Parallel Genetic Algorithms: A Comparison of the Global, Grid and Island Models.使用Hadoop MapReduce实现并行遗传算法:全局模型、网格模型和孤岛模型的比较
Evol Comput. 2018 Winter;26(4):535-567. doi: 10.1162/evco_a_00213. Epub 2017 Jun 29.
10
Impact of Microarray Preprocessing Techniques in Unraveling Biological Pathways.微阵列预处理技术在揭示生物途径中的影响。
J Comput Biol. 2016 Dec;23(12):957-968. doi: 10.1089/cmb.2016.0042. Epub 2016 Aug 5.

引用本文的文献

1
Identification of key genes and functional enrichment analysis of liver fibrosis in nonalcoholic fatty liver disease through weighted gene co-expression network analysis.通过加权基因共表达网络分析鉴定非酒精性脂肪性肝病肝纤维化的关键基因并进行功能富集分析
Genomics Inform. 2023 Dec;21(4):e45. doi: 10.5808/gi.23051. Epub 2023 Dec 29.
2
Fast and Accurate Motion Correction for Two-Photon Ca Imaging in Behaving Mice.行为小鼠双光子钙成像的快速准确运动校正
Front Neuroinform. 2022 Apr 26;16:851188. doi: 10.3389/fninf.2022.851188. eCollection 2022.
3
Identification and validation of hub genes for diabetic retinopathy.

本文引用的文献

1
Bioinformatics and Microarray Data Analysis on the Cloud.云端生物信息学与微阵列数据分析
Methods Mol Biol. 2016;1375:25-39. doi: 10.1007/7651_2015_236.
2
Comparison of gene expression microarray data with count-based RNA measurements informs microarray interpretation.将基因表达微阵列数据与基于计数的RNA测量结果进行比较有助于微阵列的解读。
BMC Genomics. 2014 Aug 4;15(1):649. doi: 10.1186/1471-2164-15-649.
3
In silico research in the era of cloud computing.云计算时代的计算机模拟研究。
糖尿病视网膜病变关键基因的鉴定与验证
PeerJ. 2021 Sep 13;9:e12126. doi: 10.7717/peerj.12126. eCollection 2021.
4
A Controlled Study of the Feasibility and Efficacy of a Cloud-Based Interactive Management Program Between Patients with Psoriasis and Physicians.一项基于云的银屑病医患互动管理方案的可行性和疗效的对照研究。
Med Sci Monit. 2019 Feb 4;25:970-976. doi: 10.12659/MSM.913304.
Nat Biotechnol. 2010 Nov;28(11):1181-5. doi: 10.1038/nbt1110-1181.
4
A distribution free summarization method for Affymetrix GeneChip arrays.一种用于Affymetrix基因芯片阵列的无分布汇总方法。
Bioinformatics. 2007 Feb 1;23(3):321-7. doi: 10.1093/bioinformatics/btl609. Epub 2006 Dec 5.
5
Genetic reclassification of histologic grade delineates new clinical subtypes of breast cancer.组织学分级的基因重新分类界定了乳腺癌的新临床亚型。
Cancer Res. 2006 Nov 1;66(21):10292-301. doi: 10.1158/0008-5472.CAN-05-4414.
6
How to decide? Different methods of calculating gene expression from short oligonucleotide array data will give different results.如何做出决定?从短寡核苷酸阵列数据计算基因表达的不同方法会得出不同的结果。
BMC Bioinformatics. 2006 Mar 15;7:137. doi: 10.1186/1471-2105-7-137.
7
affy--analysis of Affymetrix GeneChip data at the probe level.affy——在探针水平对Affymetrix基因芯片数据进行分析。
Bioinformatics. 2004 Feb 12;20(3):307-15. doi: 10.1093/bioinformatics/btg405.
8
Algorithms for high-density oligonucleotide array.高密度寡核苷酸阵列算法
Curr Opin Drug Discov Devel. 2003 May;6(3):339-45.
9
Summaries of Affymetrix GeneChip probe level data.Affymetrix基因芯片探针水平数据摘要。
Nucleic Acids Res. 2003 Feb 15;31(4):e15. doi: 10.1093/nar/gng015.