寡核苷酸微阵列数据预处理对患者队列研究分析的影响。

The effect of oligonucleotide microarray data pre-processing on the analysis of patient-cohort studies.

作者信息

Verhaak Roel G W, Staal Frank J T, Valk Peter J M, Lowenberg Bob, Reinders Marcel J T, de Ridder Dick

机构信息

Department of Hematology, Erasmus Medical Center, Rotterdam, The Netherlands.

出版信息

BMC Bioinformatics. 2006 Mar 2;7:105. doi: 10.1186/1471-2105-7-105.

DOI:10.1186/1471-2105-7-105

PMID:16512908

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1481623/

Abstract

BACKGROUND

Intensity values measured by Affymetrix microarrays have to be both normalized, to be able to compare different microarrays by removing non-biological variation, and summarized, generating the final probe set expression values. Various pre-processing techniques, such as dChip, GCRMA, RMA and MAS have been developed for this purpose. This study assesses the effect of applying different pre-processing methods on the results of analyses of large Affymetrix datasets. By focusing on practical applications of microarray-based research, this study provides insight into the relevance of pre-processing procedures to biology-oriented researchers.

RESULTS

Using two publicly available datasets, i.e., gene-expression data of 285 patients with Acute Myeloid Leukemia (AML, Affymetrix HG-U133A GeneChip) and 42 samples of tumor tissue of the embryonal central nervous system (CNS, Affymetrix HuGeneFL GeneChip), we tested the effect of the four pre-processing strategies mentioned above, on (1) expression level measurements, (2) detection of differential expression, (3) cluster analysis and (4) classification of samples. In most cases, the effect of pre-processing is relatively small compared to other choices made in an analysis for the AML dataset, but has a more profound effect on the outcome of the CNS dataset. Analyses on individual probe sets, such as testing for differential expression, are affected most; supervised, multivariate analyses such as classification are far less sensitive to pre-processing.

CONCLUSION

Using two experimental datasets, we show that the choice of pre-processing method is of relatively minor influence on the final analysis outcome of large microarray studies whereas it can have important effects on the results of a smaller study. The data source (platform, tissue homogeneity, RNA quality) is potentially of bigger importance than the choice of pre-processing method.

摘要

背景

通过Affymetrix微阵列测量的强度值必须进行归一化处理，以便能够通过消除非生物学变异来比较不同的微阵列，并且要进行汇总，以生成最终的探针集表达值。为此已经开发了各种预处理技术，例如dChip、GCRMA、RMA和MAS。本研究评估了应用不同预处理方法对大型Affymetrix数据集分析结果的影响。通过关注基于微阵列研究的实际应用，本研究为以生物学为导向的研究人员提供了关于预处理程序相关性的见解。

结果

使用两个公开可用的数据集，即285例急性髓系白血病（AML，Affymetrix HG-U133A基因芯片）患者的基因表达数据和42例胚胎中枢神经系统肿瘤组织（CNS，Affymetrix HuGeneFL基因芯片）样本，我们测试了上述四种预处理策略对（1）表达水平测量、（2）差异表达检测、（3）聚类分析和（4）样本分类的影响。在大多数情况下，与AML数据集中分析中做出的其他选择相比，预处理的影响相对较小，但对CNS数据集的结果有更深远的影响。对单个探针集的分析，如差异表达测试，受影响最大；像分类这样的监督多变量分析对预处理的敏感性要低得多。

结论

使用两个实验数据集，我们表明预处理方法的选择对大型微阵列研究的最终分析结果影响相对较小，而对较小研究结果可能有重要影响。数据来源（平台、组织同质性、RNA质量）可能比预处理方法的选择更重要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc23/1481623/bba0e1488cc9/1471-2105-7-105-1.jpg

相似文献

The effect of oligonucleotide microarray data pre-processing on the analysis of patient-cohort studies.寡核苷酸微阵列数据预处理对患者队列研究分析的影响。

BMC Bioinformatics. 2006 Mar 2;7:105. doi: 10.1186/1471-2105-7-105.

A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis.用于微阵列基因表达癌症诊断的多类别分类方法的综合评估。

Bioinformatics. 2005 Mar 1;21(5):631-43. doi: 10.1093/bioinformatics/bti033. Epub 2004 Sep 16.

Interactively optimizing signal-to-noise ratios in expression profiling: project-specific algorithm selection and detection p-value weighting in Affymetrix microarrays.在表达谱分析中交互式优化信噪比：Affymetrix微阵列中特定项目的算法选择和检测p值加权

Bioinformatics. 2004 Nov 1;20(16):2534-44. doi: 10.1093/bioinformatics/bth280. Epub 2004 Apr 29.

SplicerAV: a tool for mining microarray expression data for changes in RNA processing.剪接体分析工具（SplicerAV）：一种挖掘微阵列表达数据中 RNA 处理变化的工具。

BMC Bioinformatics. 2010 Feb 25;11:108. doi: 10.1186/1471-2105-11-108.

A robust meta-classification strategy for cancer diagnosis from gene expression data.一种基于基因表达数据进行癌症诊断的强大元分类策略。

Proc IEEE Comput Syst Bioinform Conf. 2005:322-5. doi: 10.1109/csb.2005.7.

BMC Bioinformatics. 2007 Jun 16;8:206. doi: 10.1186/1471-2105-8-206.

Classification using partial least squares with penalized logistic regression.使用带有惩罚逻辑回归的偏最小二乘法进行分类。

Bioinformatics. 2005 Apr 1;21(7):1104-11. doi: 10.1093/bioinformatics/bti114. Epub 2004 Nov 5.

Smoothing blemished gene expression microarray data via missing value imputation.通过缺失值插补平滑有瑕疵的基因表达微阵列数据。

Annu Int Conf IEEE Eng Med Biol Soc. 2008;2008:5688-91. doi: 10.1109/IEMBS.2008.4650505.

Text-derived concept profiles support assessment of DNA microarray data for acute myeloid leukemia and for androgen receptor stimulation.基于文本的概念概况有助于评估急性髓系白血病和雄激素受体刺激的DNA微阵列数据。

BMC Bioinformatics. 2007 Jan 18;8:14. doi: 10.1186/1471-2105-8-14.

Robust multi-tissue gene panel for cancer detection.用于癌症检测的稳健多组织基因panel。

BMC Cancer. 2010 Jun 22;10:319. doi: 10.1186/1471-2407-10-319.

引用本文的文献

Prediction of early breast cancer patient survival using ensembles of hypoxia signatures.利用缺氧特征的集成预测早期乳腺癌患者的生存。

PLoS One. 2018 Sep 14;13(9):e0204123. doi: 10.1371/journal.pone.0204123. eCollection 2018.

Ensemble analyses improve signatures of tumour hypoxia and reveal inter-platform differences.整合分析改善了肿瘤缺氧特征并揭示了平台间差异。

BMC Bioinformatics. 2014 Jun 6;15:170. doi: 10.1186/1471-2105-15-170.

Unifying gene expression measures from multiple platforms using factor analysis.利用因子分析统一多个平台的基因表达测量。

PLoS One. 2011 Mar 11;6(3):e17691. doi: 10.1371/journal.pone.0017691.

A comprehensive sensitivity analysis of microarray breast cancer classification under feature variability.基于特征可变性的微阵列乳腺癌分类的综合敏感性分析。

BMC Bioinformatics. 2009 Nov 26;10:389. doi: 10.1186/1471-2105-10-389.

Comparative analysis of methods for gene transcription profiling data derived from different microarray technologies in rat and mouse models of diabetes.糖尿病大鼠和小鼠模型中源自不同微阵列技术的基因转录谱数据方法的比较分析。

BMC Genomics. 2009 Feb 5;10:63. doi: 10.1186/1471-2164-10-63.

New candidate genes for sex-comb divergence between Drosophila mauritiana and Drosophila simulans.毛里求斯果蝇和拟果蝇之间性梳差异的新候选基因。

Genetics. 2007 Aug;176(4):2561-76. doi: 10.1534/genetics.106.067686. Epub 2007 Jun 11.

Statistical analysis of an RNA titration series evaluates microarray precision and sensitivity on a whole-array basis.对RNA滴定系列进行统计分析可在全阵列基础上评估微阵列的精度和灵敏度。

BMC Bioinformatics. 2006 Nov 22;7:511. doi: 10.1186/1471-2105-7-511.

本文引用的文献

Gene expression profiling in single cells from the pancreatic islets of Langerhans reveals lognormal distribution of mRNA levels.对来自胰岛的单细胞进行基因表达谱分析，揭示了mRNA水平的对数正态分布。

Genome Res. 2005 Oct;15(10):1388-92. doi: 10.1101/gr.3820805.

Purity for clarity: the need for purification of tumor cells in DNA microarray studies.

Leukemia. 2005 Apr;19(4):618-27. doi: 10.1038/sj.leu.2403685.

AML1-ETO fusion protein up-regulates TRKA mRNA expression in human CD34+ cells, allowing nerve growth factor-induced expansion.AML1-ETO融合蛋白上调人CD34+细胞中TRKA mRNA的表达，使神经生长因子诱导的细胞扩增成为可能。

Proc Natl Acad Sci U S A. 2005 Mar 15;102(11):4016-21. doi: 10.1073/pnas.0404701102. Epub 2005 Feb 24.

Comparison of seven methods for producing Affymetrix expression scores based on False Discovery Rates in disease profiling data.基于疾病谱数据中错误发现率的七种生成Affymetrix表达分数方法的比较。

BMC Bioinformatics. 2005 Feb 10;6:26. doi: 10.1186/1471-2105-6-26.

Comparison of preprocessing procedures for oligo-nucleotide micro-arrays by parametric bootstrap simulation of spike-in experiments.通过掺入实验的参数自展模拟对寡核苷酸微阵列预处理程序的比较

Methods Inf Med. 2004;43(5):434-8.

Stability-based validation of clustering solutions.基于稳定性的聚类解决方案验证。

Neural Comput. 2004 Jun;16(6):1299-323. doi: 10.1162/089976604773717621.

Prognostically useful gene-expression profiles in acute myeloid leukemia.急性髓系白血病中具有预后价值的基因表达谱

N Engl J Med. 2004 Apr 15;350(16):1617-28. doi: 10.1056/NEJMoa040465.

Use of gene-expression profiling to identify prognostic subclasses in adult acute myeloid leukemia.利用基因表达谱分析鉴定成人急性髓系白血病的预后亚类。

N Engl J Med. 2004 Apr 15;350(16):1605-16. doi: 10.1056/NEJMoa031046.

A benchmark for Affymetrix GeneChip expression measures.Affymetrix基因芯片表达量测量的一个基准。

Bioinformatics. 2004 Feb 12;20(3):323-31. doi: 10.1093/bioinformatics/btg410.

Solving the riddle of the bright mismatches: labeling and effective binding in oligonucleotide arrays.解开明亮错配之谜：寡核苷酸阵列中的标记与有效结合

Phys Rev E Stat Nonlin Soft Matter Phys. 2003 Jul;68(1 Pt 1):011906. doi: 10.1103/PhysRevE.68.011906. Epub 2003 Jul 16.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

寡核苷酸微阵列数据预处理对患者队列研究分析的影响。

The effect of oligonucleotide microarray data pre-processing on the analysis of patient-cohort studies.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献