使用随机复合协变量进行组学数据分析的统计学方面。

Statistical aspects of omics data analysis using the random compound covariate.

作者信息

Su Pei-Fang, Chen Xi, Chen Heidi, Shyr Yu

机构信息

Center for Quantitative Sciences, Vanderbilt University, Nashville, TN, USA.

出版信息

BMC Syst Biol. 2012;6 Suppl 3(Suppl 3):S11. doi: 10.1186/1752-0509-6-S3-S11. Epub 2012 Dec 17.

DOI:10.1186/1752-0509-6-S3-S11

PMID:23281681

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3524312/

Abstract

BACKGROUND

Dealing with high dimensional markers, such as gene expression data obtained using microarray chip technology or genomics studies, is a key challenge because the numbers of features greatly exceeds the number of biological samples. After selecting biologically relevant genes, how to summarize the expression of selected genes and then further build predicted model is an important issue in medical applications. One intuitive method of addressing this challenge assigns different weights to different features, subsequently combining this information into a single score, named the compound covariate. Investigators commonly employ this score to assess whether an association exists between the compound covariate and clinical outcomes adjusted for baseline covariates. However, we found that some clinical papers concerned with such analysis report bias p-values based on flawed compound covariate in their training data set.

RESULTS

We correct this flaw in the analysis and we also propose treating the compound score as a random covariate, to achieve more appropriate results and significantly improve study power for survival outcomes. With this proposed method, we thoroughly assess the performance of two commonly used estimated gene weights through simulation studies. When the sample size is 100, and censoring rates are 50%, 30%, and 10%, power is increased by 10.6%, 3.5%, and 0.4%, respectively, by treating the compound score as a random covariate rather than a fixed covariate. Finally, we assess our proposed method using two publicly available microarray data sets.

CONCLUSION

In this article, we correct this flaw in the analysis and the propose method, treating the compound score as a random covariate, can achieve more appropriate results and improve study power for survival outcomes.

摘要

背景

处理高维标志物，如使用微阵列芯片技术获得的基因表达数据或基因组学研究数据，是一项关键挑战，因为特征数量大大超过生物样本数量。在选择具有生物学相关性的基因后，如何总结所选基因的表达情况，然后进一步构建预测模型，是医学应用中的一个重要问题。解决这一挑战的一种直观方法是为不同特征赋予不同权重，随后将这些信息整合为一个单一分数，即复合协变量。研究人员通常使用该分数来评估复合协变量与经基线协变量调整后的临床结局之间是否存在关联。然而，我们发现一些涉及此类分析的临床论文在其训练数据集中基于有缺陷的复合协变量报告了有偏差的p值。

结果

我们纠正了分析中的这一缺陷，并且还建议将复合分数视为随机协变量，以获得更合适的结果，并显著提高生存结局研究的效能。使用这种建议的方法，我们通过模拟研究全面评估了两种常用的估计基因权重的性能。当样本量为100，删失率分别为50%、30%和10%时，将复合分数视为随机协变量而非固定协变量，效能分别提高了10.6%、3.5%和0.4%。最后，我们使用两个公开可用的微阵列数据集评估了我们建议的方法。

结论

在本文中，我们纠正了分析中的这一缺陷，并且建议的将复合分数视为随机协变量的方法可以获得更合适的结果，并提高生存结局研究的效能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88d2/3524312/45fd1e748638/1752-0509-6-S3-S11-1.jpg

相似文献

Statistical aspects of omics data analysis using the random compound covariate.

BMC Syst Biol. 2012;6 Suppl 3(Suppl 3):S11. doi: 10.1186/1752-0509-6-S3-S11. Epub 2012 Dec 17.

Gene selection for survival data under dependent censoring: A copula-based approach.

Stat Methods Med Res. 2016 Dec;25(6):2840-2857. doi: 10.1177/0962280214533378. Epub 2014 May 11.

Cox regression with survival-time-dependent missing covariate values.

Biometrics. 2020 Jun;76(2):460-471. doi: 10.1111/biom.13155. Epub 2019 Nov 18.

NCC-AUC: an AUC optimization method to identify multi-biomarker panel for cancer prognosis from genomic and clinical data.

Bioinformatics. 2015 Oct 15;31(20):3330-8. doi: 10.1093/bioinformatics/btv374. Epub 2015 Jun 18.

Covariate dimension reduction for survival data via the Gaussian process latent variable model.

Stat Med. 2016 Apr 15;35(8):1340-53. doi: 10.1002/sim.6784. Epub 2015 Nov 3.

A GMM-IG framework for selecting genes as expression panel biomarkers.

Artif Intell Med. 2010 Feb-Mar;48(2-3):75-82. doi: 10.1016/j.artmed.2009.07.006. Epub 2009 Dec 8.

Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data.

Bioinformatics. 2005 Jul 1;21(13):3001-8. doi: 10.1093/bioinformatics/bti422. Epub 2005 Apr 6.

Random forests-based differential analysis of gene sets for gene expression data.

Gene. 2013 Apr 10;518(1):179-86. doi: 10.1016/j.gene.2012.11.034. Epub 2012 Dec 6.

ARSyN: a method for the identification and removal of systematic noise in multifactorial time course microarray experiments.

Biostatistics. 2012 Jul;13(3):553-66. doi: 10.1093/biostatistics/kxr042. Epub 2011 Nov 14.

Meta-analysis methods for combining multiple expression profiles: comparisons, statistical characterization and an application guideline.

BMC Bioinformatics. 2013 Dec 21;14:368. doi: 10.1186/1471-2105-14-368.

引用本文的文献

Cardiovascular events and all-cause mortality in surgically or medically treated primary aldosteronism: A Meta-analysis.

J Renin Angiotensin Aldosterone Syst. 2021 Jan-Dec;22(1):14703203211003781. doi: 10.1177/14703203211003781.

The change of serum tumor necrosis factor alpha in patients with type 1 diabetes mellitus: A systematic review and meta-analysis.

PLoS One. 2017 Apr 20;12(4):e0176157. doi: 10.1371/journal.pone.0176157. eCollection 2017.

Advances in systems biology: computational algorithms and applications.

BMC Syst Biol. 2012;6 Suppl 3(Suppl 3):S1. doi: 10.1186/1752-0509-6-S3-S1. Epub 2012 Dec 17.

本文引用的文献

A four-gene signature from NCI-60 cell line for survival prediction in non-small cell lung cancer.

Clin Cancer Res. 2009 Dec 1;15(23):7309-15. doi: 10.1158/1078-0432.CCR-09-1572. Epub 2009 Nov 17.

Classification by mass spectrometry can accurately and reliably predict outcome in patients with non-small cell lung cancer treated with erlotinib-containing regimen.

J Thorac Oncol. 2009 Jun;4(6):689-96. doi: 10.1097/JTO.0b013e3181a526b3.

NOD-like receptors and human diseases.

Microbes Infect. 2007 Apr;9(5):648-57. doi: 10.1016/j.micinf.2007.01.015. Epub 2007 Jan 27.

A five-gene signature and clinical outcome in non-small-cell lung cancer.

N Engl J Med. 2007 Jan 4;356(1):11-20. doi: 10.1056/NEJMoa060096.

Validation and clinical utility of a 70-gene prognostic signature for women with node-negative breast cancer.

J Natl Cancer Inst. 2006 Sep 6;98(17):1183-92. doi: 10.1093/jnci/djj329.

Gene expression profiling predicts survival in conventional renal cell carcinoma.

PLoS Med. 2006 Jan;3(1):e13. doi: 10.1371/journal.pmed.0030013. Epub 2005 Dec 6.

Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer.

Lancet. 2005;365(9460):671-9. doi: 10.1016/S0140-6736(05)17947-1.

Prediction of survival in diffuse large-B-cell lymphoma based on the expression of six genes.

N Engl J Med. 2004 Apr 29;350(18):1828-37. doi: 10.1056/NEJMoa032520.

A gene-expression signature as a predictor of survival in breast cancer.

N Engl J Med. 2002 Dec 19;347(25):1999-2009. doi: 10.1056/NEJMoa021967.

A paradigm for class prediction using gene expression profiles.

J Comput Biol. 2002;9(3):505-11. doi: 10.1089/106652702760138592.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用随机复合协变量进行组学数据分析的统计学方面。

Statistical aspects of omics data analysis using the random compound covariate.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献