基于 MAQC-II 项目生成的大型且多样化数据集的比较分析：选择单个模型还是组合多个模型用于基于微阵列的分类器开发？

Selecting a single model or combining multiple models for microarray-based classifier development?--a comparative analysis based on large and diverse datasets generated from the MAQC-II project.

机构信息

Center for Bioinformatics, Division of Systems Biology, National Center for Toxicological Research, US Food & Drug Administration, 3900 NCTR Rd, Jefferson, Arkansas, USA.

出版信息

BMC Bioinformatics. 2011 Oct 18;12 Suppl 10(Suppl 10):S3. doi: 10.1186/1471-2105-12-S10-S3.

DOI:10.1186/1471-2105-12-S10-S3

PMID:22166133

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3236846/

Abstract

BACKGROUND

Genomic biomarkers play an increasing role in both preclinical and clinical application. Development of genomic biomarkers with microarrays is an area of intensive investigation. However, despite sustained and continuing effort, developing microarray-based predictive models (i.e., genomics biomarkers) capable of reliable prediction for an observed or measured outcome (i.e., endpoint) of unknown samples in preclinical and clinical practice remains a considerable challenge. No straightforward guidelines exist for selecting a single model that will perform best when presented with unknown samples. In the second phase of the MicroArray Quality Control (MAQC-II) project, 36 analysis teams produced a large number of models for 13 preclinical and clinical endpoints. Before external validation was performed, each team nominated one model per endpoint (referred to here as 'nominated models') from which MAQC-II experts selected 13 'candidate models' to represent the best model for each endpoint. Both the nominated and candidate models from MAQC-II provide benchmarks to assess other methodologies for developing microarray-based predictive models.

METHODS

We developed a simple ensemble method by taking a number of the top performing models from cross-validation and developing an ensemble model for each of the MAQC-II endpoints. We compared the ensemble models with both nominated and candidate models from MAQC-II using blinded external validation.

RESULTS

For 10 of the 13 MAQC-II endpoints originally analyzed by the MAQC-II data analysis team from the National Center for Toxicological Research (NCTR), the ensemble models achieved equal or better predictive performance than the NCTR nominated models. Additionally, the ensemble models had performance comparable to the MAQC-II candidate models. Most ensemble models also had better performance than the nominated models generated by five other MAQC-II data analysis teams that analyzed all 13 endpoints.

CONCLUSIONS

Our findings suggest that an ensemble method can often attain a higher average predictive performance in an external validation set than a corresponding "optimized" model method. Using an ensemble method to determine a final model is a potentially important supplement to the good modeling practices recommended by the MAQC-II project for developing microarray-based genomic biomarkers.

摘要

背景

基因组生物标志物在临床前和临床应用中发挥着越来越重要的作用。利用微阵列开发基因组生物标志物是一个研究热点。然而，尽管持续不断地努力，开发能够可靠预测临床前和临床实践中未知样本观察或测量结果（即终点）的基于微阵列的预测模型（即基因组生物标志物）仍然是一个相当大的挑战。目前尚无明确的指导方针可以选择在面对未知样本时表现最佳的单一模型。在微阵列质量控制（MAQC-II）项目的第二阶段，36 个分析团队为 13 个临床前和临床终点生产了大量模型。在进行外部验证之前，每个团队都从每个终点提名一个模型（这里称为“提名模型”），MAQC-II 专家从中选择了 13 个“候选模型”，以代表每个终点的最佳模型。MAQC-II 的提名模型和候选模型都为评估开发基于微阵列的预测模型的其他方法提供了基准。

方法

我们开发了一种简单的集成方法，从交叉验证中选择一些表现最佳的模型，并为 MAQC-II 的每个终点开发一个集成模型。我们使用盲法外部验证将集成模型与 MAQC-II 的提名模型和候选模型进行了比较。

结果

在最初由国家毒理学研究中心（NCTR）的 MAQC-II 数据分析团队分析的 13 个 MAQC-II 终点中的 10 个中，集成模型的预测性能与 NCTR 提名模型相等或更好。此外，集成模型的性能与 MAQC-II 候选模型相当。大多数集成模型的性能也优于分析所有 13 个终点的其他 5 个 MAQC-II 数据分析团队生成的提名模型。

结论

我们的研究结果表明，在外部验证集中，集成方法通常可以达到比相应的“优化”模型方法更高的平均预测性能。使用集成方法来确定最终模型是对 MAQC-II 项目推荐的用于开发基于微阵列的基因组生物标志物的良好建模实践的重要补充。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b86b/3236846/12a4fc234968/1471-2105-12-S10-S3-1.jpg

相似文献

Selecting a single model or combining multiple models for microarray-based classifier development?--a comparative analysis based on large and diverse datasets generated from the MAQC-II project.基于 MAQC-II 项目生成的大型且多样化数据集的比较分析：选择单个模型还是组合多个模型用于基于微阵列的分类器开发？

BMC Bioinformatics. 2011 Oct 18;12 Suppl 10(Suppl 10):S3. doi: 10.1186/1471-2105-12-S10-S3.

The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models.《基因芯片质量控制（MAQC）-II 研究：基于基因芯片的预测模型的开发和验证的常见实践》。

Nat Biotechnol. 2010 Aug;28(8):827-38. doi: 10.1038/nbt.1665. Epub 2010 Jul 30.

Consistency of predictive signature genes and classifiers generated using different microarray platforms.不同微阵列平台生成的预测特征基因和分类器的一致性。

Pharmacogenomics J. 2010 Aug;10(4):247-57. doi: 10.1038/tpj.2010.34.

Maximizing biomarker discovery by minimizing gene signatures.通过最小化基因特征来最大化生物标志物的发现。

BMC Genomics. 2011 Dec 23;12 Suppl 5(Suppl 5):S6. doi: 10.1186/1471-2164-12-S5-S6.

Feature selection and classification of MAQC-II breast cancer and multiple myeloma microarray gene expression data.MAQC-II 乳腺癌和多发性骨髓瘤基因表达数据的特征选择和分类。

PLoS One. 2009 Dec 11;4(12):e8250. doi: 10.1371/journal.pone.0008250.

Reproducibility of microarray data: a further analysis of microarray quality control (MAQC) data.微阵列数据的可重复性：对微阵列质量控制（MAQC）数据的进一步分析。

BMC Bioinformatics. 2007 Oct 25;8:412. doi: 10.1186/1471-2105-8-412.

A comparison of batch effect removal methods for enhancement of prediction performance using MAQC-II microarray gene expression data.使用 MAQC-II 微阵列基因表达数据比较批次效应消除方法以增强预测性能。

Pharmacogenomics J. 2010 Aug;10(4):278-91. doi: 10.1038/tpj.2010.57.

Transcriptome sequencing of the Microarray Quality Control (MAQC) RNA reference samples using next generation sequencing.使用下一代测序技术对微阵列质量控制（MAQC）RNA参考样本进行转录组测序。

BMC Genomics. 2009 Jun 12;10:264. doi: 10.1186/1471-2164-10-264.

Rat toxicogenomic study reveals analytical consistency across microarray platforms.大鼠毒理基因组学研究揭示了不同微阵列平台间的分析一致性。

Nat Biotechnol. 2006 Sep;24(9):1162-9. doi: 10.1038/nbt1238.

Evaluation of gene expression data generated from expired Affymetrix GeneChip® microarrays using MAQC reference RNA samples.利用 MAQC 参考 RNA 样本评估过期的 Affymetrix GeneChip® 微阵列生成的基因表达数据。

BMC Bioinformatics. 2010 Oct 7;11 Suppl 6(Suppl 6):S10. doi: 10.1186/1471-2105-11-S6-S10.

引用本文的文献

Transcriptome modulation by hydrocortisone in severe burn shock: ancillary analysis of a prospective randomized trial.皮质醇对严重烧伤休克患者转录组的调节：一项前瞻性随机试验的辅助分析。

Crit Care. 2017 Jun 16;21(1):158. doi: 10.1186/s13054-017-1743-9.

CYCLoPs: A Comprehensive Database Constructed from Automated Analysis of Protein Abundance and Subcellular Localization Patterns in Saccharomyces cerevisiae.CYCLoPs：一个通过对酿酒酵母中蛋白质丰度和亚细胞定位模式进行自动分析构建的综合数据库。

G3 (Bethesda). 2015 Apr 15;5(6):1223-32. doi: 10.1534/g3.115.017830.

Gene expression profiles for predicting metastasis in breast cancer: a cross-study comparison of classification methods.用于预测乳腺癌转移的基因表达谱：分类方法的跨研究比较

ScientificWorldJournal. 2012;2012:380495. doi: 10.1100/2012/380495. Epub 2012 Nov 28.

Proceedings of the 2012 MidSouth Computational Biology and Bioinformatics Society (MCBIOS) conference. Introduction.2012年中南计算生物学与生物信息学学会（MCBIOS）会议论文集。引言。

BMC Bioinformatics. 2012;13 Suppl 15(Suppl 15):S1. doi: 10.1186/1471-2105-13-S15-S1. Epub 2012 Sep 11.

Proceedings of the 2011 MidSouth Computational Biology and Bioinformatics Society (MCBIOS) conference. Introduction.2011年中南计算生物学与生物信息学学会（MCBIOS）会议论文集。引言。

BMC Bioinformatics. 2011 Oct 18;12 Suppl 10(Suppl 10):S1. doi: 10.1186/1471-2105-12-S10-S1.

本文引用的文献

Nat Biotechnol. 2010 Aug;28(8):827-38. doi: 10.1038/nbt.1665. Epub 2010 Jul 30.

Gene expression response in target organ and whole blood varies as a function of target organ injury phenotype.靶器官和全血中的基因表达反应随靶器官损伤表型而变化。

Genome Biol. 2008;9(6):R100. doi: 10.1186/gb-2008-9-6-r100. Epub 2008 Jun 20.

Early diagnosis of pancreatic cancer: neutrophil gelatinase-associated lipocalin as a marker of pancreatic intraepithelial neoplasia.胰腺癌的早期诊断：中性粒细胞明胶酶相关脂质运载蛋白作为胰腺上皮内瘤变的标志物

Br J Cancer. 2008 May 6;98(9):1540-7. doi: 10.1038/sj.bjc.6604329. Epub 2008 Apr 8.

A gene expression biomarker provides early prediction and mechanistic assessment of hepatic tumor induction by nongenotoxic chemicals.一种基因表达生物标志物可对非遗传毒性化学物质诱导肝肿瘤提供早期预测和机制评估。

Toxicol Sci. 2007 Sep;99(1):90-100. doi: 10.1093/toxsci/kfm156. Epub 2007 Jun 8.

Identification of candidate molecular markers predicting sensitivity in solid tumors to dasatinib: rationale for patient selection.鉴定预测实体瘤对达沙替尼敏感性的候选分子标志物：患者选择的理论依据。

Cancer Res. 2007 Mar 1;67(5):2226-38. doi: 10.1158/0008-5472.CAN-06-3633.

Application of genomic biomarkers to predict increased lung tumor incidence in 2-year rodent cancer bioassays.基因组生物标志物在预测两年期啮齿类动物癌症生物测定中肺部肿瘤发病率增加方面的应用。

Toxicol Sci. 2007 May;97(1):55-64. doi: 10.1093/toxsci/kfm023. Epub 2007 Feb 20.

Consensus analysis of multiple classifiers using non-repetitive variables: diagnostic application to microarray gene expression data.使用非重复变量的多个分类器的一致性分析：在微阵列基因表达数据中的诊断应用

Comput Biol Chem. 2007 Feb;31(1):48-56. doi: 10.1016/j.compbiolchem.2007.01.001. Epub 2007 Jan 4.

Critical review of published microarray studies for cancer outcome and guidelines on statistical analysis and reporting.已发表的癌症预后微阵列研究的批判性综述以及统计分析与报告指南。

J Natl Cancer Inst. 2007 Jan 17;99(2):147-57. doi: 10.1093/jnci/djk018.

A validated gene expression model of high-risk multiple myeloma is defined by deregulated expression of genes mapping to chromosome 1.一种经过验证的高危多发性骨髓瘤基因表达模型是由定位于1号染色体上的基因表达失调所定义的。

Blood. 2007 Mar 15;109(6):2276-84. doi: 10.1182/blood-2006-07-038430. Epub 2006 Nov 14.

Customized oligonucleotide microarray gene expression-based classification of neuroblastoma patients outperforms current clinical risk stratification.基于定制寡核苷酸微阵列基因表达的神经母细胞瘤患者分类优于当前的临床风险分层。

J Clin Oncol. 2006 Nov 1;24(31):5070-8. doi: 10.1200/JCO.2006.06.1879.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于 MAQC-II 项目生成的大型且多样化数据集的比较分析：选择单个模型还是组合多个模型用于基于微阵列的分类器开发？

Selecting a single model or combining multiple models for microarray-based classifier development?--a comparative analysis based on large and diverse datasets generated from the MAQC-II project.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献