使用 MAQC-II 微阵列基因表达数据比较批次效应消除方法以增强预测性能。

A comparison of batch effect removal methods for enhancement of prediction performance using MAQC-II microarray gene expression data.

机构信息

Systems Analytics Inc., Waltham, MA, USA.

出版信息

Pharmacogenomics J. 2010 Aug;10(4):278-91. doi: 10.1038/tpj.2010.57.

DOI:10.1038/tpj.2010.57

PMID:20676067

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2920074/

Abstract

Batch effects are the systematic non-biological differences between batches (groups) of samples in microarray experiments due to various causes such as differences in sample preparation and hybridization protocols. Previous work focused mainly on the development of methods for effective batch effects removal. However, their impact on cross-batch prediction performance, which is one of the most important goals in microarray-based applications, has not been addressed. This paper uses a broad selection of data sets from the Microarray Quality Control Phase II (MAQC-II) effort, generated on three microarray platforms with different causes of batch effects to assess the efficacy of their removal. Two data sets from cross-tissue and cross-platform experiments are also included. Of the 120 cases studied using Support vector machines (SVM) and K nearest neighbors (KNN) as classifiers and Matthews correlation coefficient (MCC) as performance metric, we find that Ratio-G, Ratio-A, EJLR, mean-centering and standardization methods perform better or equivalent to no batch effect removal in 89, 85, 83, 79 and 75% of the cases, respectively, suggesting that the application of these methods is generally advisable and ratio-based methods are preferred.

摘要

批次效应是指在微阵列实验中，由于各种原因（如样品制备和杂交方案的差异）导致批次（组）之间存在系统性的非生物学差异。以前的工作主要集中在开发有效去除批次效应的方法上。然而，批次效应对跨批次预测性能的影响（这是基于微阵列应用的最重要目标之一）尚未得到解决。本文使用来自 Microarray Quality Control Phase II（MAQC-II）计划的广泛选择的数据，这些数据是在三个具有不同批次效应原因的微阵列平台上生成的，用于评估去除批次效应的效果。还包括两个来自跨组织和跨平台实验的数据。在使用支持向量机（SVM）和 K 最近邻（KNN）作为分类器以及 Matthews 相关系数（MCC）作为性能指标研究的 120 个案例中，我们发现 Ratio-G、Ratio-A、EJLR、均值中心化和标准化方法在 89%、85%、83%、79%和 75%的案例中表现优于或等同于不进行批次效应去除，这表明这些方法的应用通常是明智的，并且基于比率的方法是首选。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0752/2920074/852b147c88a2/tpj201057f1.jpg

相似文献

A comparison of batch effect removal methods for enhancement of prediction performance using MAQC-II microarray gene expression data.

Pharmacogenomics J. 2010 Aug;10(4):278-91. doi: 10.1038/tpj.2010.57.

Reproducibility of microarray data: a further analysis of microarray quality control (MAQC) data.

BMC Bioinformatics. 2007 Oct 25;8:412. doi: 10.1186/1471-2105-8-412.

Feature selection and classification of MAQC-II breast cancer and multiple myeloma microarray gene expression data.

PLoS One. 2009 Dec 11;4(12):e8250. doi: 10.1371/journal.pone.0008250.

Consistency of predictive signature genes and classifiers generated using different microarray platforms.

Pharmacogenomics J. 2010 Aug;10(4):247-57. doi: 10.1038/tpj.2010.34.

Selecting a single model or combining multiple models for microarray-based classifier development?--a comparative analysis based on large and diverse datasets generated from the MAQC-II project.

BMC Bioinformatics. 2011 Oct 18;12 Suppl 10(Suppl 10):S3. doi: 10.1186/1471-2105-12-S10-S3.

Cross-platform comparison of SYBR Green real-time PCR with TaqMan PCR, microarrays and other gene expression measurement technologies evaluated in the MicroArray Quality Control (MAQC) study.

BMC Genomics. 2008 Jul 11;9:328. doi: 10.1186/1471-2164-9-328.

Rat toxicogenomic study reveals analytical consistency across microarray platforms.

Nat Biotechnol. 2006 Sep;24(9):1162-9. doi: 10.1038/nbt1238.

Performance comparison of two microarray platforms to assess differential gene expression in human monocyte and macrophage cells.

BMC Genomics. 2008 Jun 25;9:302. doi: 10.1186/1471-2164-9-302.

The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models.

Nat Biotechnol. 2010 Aug;28(8):827-38. doi: 10.1038/nbt.1665. Epub 2010 Jul 30.

Evaluation of gene expression data generated from expired Affymetrix GeneChip® microarrays using MAQC reference RNA samples.

BMC Bioinformatics. 2010 Oct 7;11 Suppl 6(Suppl 6):S10. doi: 10.1186/1471-2105-11-S6-S10.

引用本文的文献

Interpretable Machine Learning for Cross-Cohort Prediction of Motor Fluctuations in Parkinson's Disease.

Mov Disord. 2025 Aug;40(8):1604-1617. doi: 10.1002/mds.30223. Epub 2025 May 14.

High performance data integration for large-scale analyses of incomplete Omic profiles using Batch-Effect Reduction Trees (BERT).

Nat Commun. 2025 Aug 2;16(1):7104. doi: 10.1038/s41467-025-62237-4.

Multi-cohort machine learning identifies predictors of cognitive impairment in Parkinson's disease.

NPJ Digit Med. 2025 Jul 26;8(1):482. doi: 10.1038/s41746-025-01862-1.

Plasma Proteomic Profiling of a Group of Anxious Dogs by LC-MS/MS: A Case-Control Study.

Proteomics Clin Appl. 2025 Jul 4:e70014. doi: 10.1002/prca.70014.

Deciphering the Oncogenic Landscape of Hepatocytes Through Integrated Single-Nucleus and Bulk RNA-Seq of Hepatocellular Carcinoma.

Adv Sci (Weinh). 2025 Apr;12(14):e2412944. doi: 10.1002/advs.202412944. Epub 2025 Feb 17.

Assessing and mitigating batch effects in large-scale omics studies.

Genome Biol. 2024 Oct 3;25(1):254. doi: 10.1186/s13059-024-03401-9.

Comparison and development of cross-study normalization methods for inter-species transcriptional analysis.

PLoS One. 2024 Sep 10;19(9):e0307997. doi: 10.1371/journal.pone.0307997. eCollection 2024.

Particle uptake in cancer cells can predict malignancy and drug resistance using machine learning.

Sci Adv. 2024 May 31;10(22):eadj4370. doi: 10.1126/sciadv.adj4370. Epub 2024 May 29.

Tensor modeling of MRSA bacteremia cytokine and transcriptional patterns reveals coordinated, outcome-associated immunological programs.

PNAS Nexus. 2024 May 4;3(5):pgae185. doi: 10.1093/pnasnexus/pgae185. eCollection 2024 May.

Multiomic Signatures of Traffic-Related Air Pollution in London Reveal Potential Short-Term Perturbations in Gut Microbiome-Related Pathways.

Environ Sci Technol. 2024 May 21;58(20):8771-8782. doi: 10.1021/acs.est.3c09148. Epub 2024 May 10.

本文引用的文献

The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models.

Nat Biotechnol. 2010 Aug;28(8):827-38. doi: 10.1038/nbt.1665. Epub 2010 Jul 30.

Consistency of predictive signature genes and classifiers generated using different microarray platforms.

Pharmacogenomics J. 2010 Aug;10(4):247-57. doi: 10.1038/tpj.2010.34.

Empirical Bayes accomodation of batch-effects in microarray data using identical replicate reference samples: application to RNA expression profiling of blood from Duchenne muscular dystrophy patients.

BMC Genomics. 2008 Oct 20;9:494. doi: 10.1186/1471-2164-9-494.

Gene expression response in target organ and whole blood varies as a function of target organ injury phenotype.

Genome Biol. 2008;9(6):R100. doi: 10.1186/gb-2008-9-6-r100. Epub 2008 Jun 20.

Orthogonal projections to latent structures as a strategy for microarray data normalization.

BMC Bioinformatics. 2007 Jun 18;8:207. doi: 10.1186/1471-2105-8-207.

A gene expression biomarker provides early prediction and mechanistic assessment of hepatic tumor induction by nongenotoxic chemicals.

Toxicol Sci. 2007 Sep;99(1):90-100. doi: 10.1093/toxsci/kfm156. Epub 2007 Jun 8.

Application of genomic biomarkers to predict increased lung tumor incidence in 2-year rodent cancer bioassays.

Toxicol Sci. 2007 May;97(1):55-64. doi: 10.1093/toxsci/kfm023. Epub 2007 Feb 20.

A validated gene expression model of high-risk multiple myeloma is defined by deregulated expression of genes mapping to chromosome 1.

Blood. 2007 Mar 15;109(6):2276-84. doi: 10.1182/blood-2006-07-038430. Epub 2006 Nov 14.

Customized oligonucleotide microarray gene expression-based classification of neuroblastoma patients outperforms current clinical risk stratification.

J Clin Oncol. 2006 Nov 1;24(31):5070-8. doi: 10.1200/JCO.2006.06.1879.

Pharmacogenomic predictor of sensitivity to preoperative chemotherapy with paclitaxel and fluorouracil, doxorubicin, and cyclophosphamide in breast cancer.

J Clin Oncol. 2006 Sep 10;24(26):4236-44. doi: 10.1200/JCO.2006.05.6861. Epub 2006 Aug 8.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用 MAQC-II 微阵列基因表达数据比较批次效应消除方法以增强预测性能。

A comparison of batch effect removal methods for enhancement of prediction performance using MAQC-II microarray gene expression data.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献