Suppr超能文献

为提高基因特征选择和临床预测中的可重复性对研究间异质性进行建模

Modeling Between-Study Heterogeneity for Improved Replicability in Gene Signature Selection and Clinical Prediction.

作者信息

Rashid Naim U, Li Quefeng, Yeh Jen Jen, Ibrahim Joseph G

机构信息

Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, U.S.A.

Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, U.S.A.

出版信息

J Am Stat Assoc. 2020;115(531):1125-1138. doi: 10.1080/01621459.2019.1671197. Epub 2019 Oct 29.

Abstract

In the genomic era, the identification of gene signatures associated with disease is of significant interest. Such signatures are often used to predict clinical outcomes in new patients and aid clinical decision-making. However, recent studies have shown that gene signatures are often not replicable. This occurrence has practical implications regarding the generalizability and clinical applicability of such signatures. To improve replicability, we introduce a novel approach to select gene signatures from multiple datasets whose effects are consistently non-zero and account for between-study heterogeneity. We build our model upon some rank-based quantities, facilitating integration over different genomic datasets. A high dimensional penalized Generalized Linear Mixed Model (pGLMM) is used to select gene signatures and address data heterogeneity. We compare our method to some commonly used strategies that select gene signatures ignoring between-study heterogeneity. We provide asymptotic results justifying the performance of our method and demonstrate its advantage in the presence of heterogeneity through thorough simulation studies. Lastly, we motivate our method through a case study subtyping pancreatic cancer patients from four gene expression studies.

摘要

在基因组时代,识别与疾病相关的基因特征备受关注。此类特征常被用于预测新患者的临床结局并辅助临床决策。然而,近期研究表明基因特征往往不可重复。这种情况对于此类特征的普遍性和临床适用性具有实际影响。为提高可重复性,我们引入一种新方法,从多个数据集选择基因特征,其效应始终非零且考虑研究间的异质性。我们基于一些基于秩的量构建模型,便于整合不同的基因组数据集。使用高维惩罚广义线性混合模型(pGLMM)来选择基因特征并解决数据异质性问题。我们将我们的方法与一些忽略研究间异质性来选择基因特征的常用策略进行比较。我们提供渐近结果以证明我们方法的性能,并通过全面的模拟研究证明其在存在异质性时的优势。最后,我们通过对来自四项基因表达研究的胰腺癌患者进行亚型分析的案例研究来推动我们的方法。

相似文献

8
A novel single-cell based method for breast cancer prognosis.一种基于单细胞的新型乳腺癌预后方法。
PLoS Comput Biol. 2020 Aug 24;16(8):e1008133. doi: 10.1371/journal.pcbi.1008133. eCollection 2020 Aug.

引用本文的文献

7
Meta-Analyzing Multiple Omics Data With Robust Variable Selection.通过稳健变量选择对多组学数据进行Meta分析
Front Genet. 2021 Jul 5;12:656826. doi: 10.3389/fgene.2021.656826. eCollection 2021.
8
Protein structure-based gene expression signatures.基于蛋白质结构的基因表达特征。
Proc Natl Acad Sci U S A. 2021 May 11;118(19). doi: 10.1073/pnas.2014866118.
9
Meta-Analysis Based on Nonconvex Regularization.基于非凸正则化的荟萃分析。
Sci Rep. 2020 Apr 1;10(1):5755. doi: 10.1038/s41598-020-62473-2.

本文引用的文献

4
Statistical Methods in Integrative Genomics.整合基因组学中的统计方法
Annu Rev Stat Appl. 2016 Jun;3:181-209. doi: 10.1146/annurev-statistics-041715-033506. Epub 2016 Apr 18.
6
Test set bias affects reproducibility of gene signatures.测试集偏差会影响基因特征的可重复性。
Bioinformatics. 2015 Jul 15;31(14):2318-23. doi: 10.1093/bioinformatics/btv157. Epub 2015 Mar 18.
8
Absolute assignment of breast cancer intrinsic molecular subtype.乳腺癌内在分子亚型的绝对分类。
J Natl Cancer Inst. 2014 Dec 4;107(1):357. doi: 10.1093/jnci/dju357. Print 2015 Jan.
9
switchBox: an R package for k-Top Scoring Pairs classifier development.开关盒:一个用于开发k-高分对分类器的R软件包。
Bioinformatics. 2015 Jan 15;31(2):273-4. doi: 10.1093/bioinformatics/btu622. Epub 2014 Sep 26.
10

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验