• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于核的超高维数据加速失效时间模型方法。

Kernel based methods for accelerated failure time model with ultra-high dimensional data.

机构信息

University of Maryland Greenebaum Cancer Center, 22 South Greene Street, Baltimore, MD 21201, USA.

出版信息

BMC Bioinformatics. 2010 Dec 21;11:606. doi: 10.1186/1471-2105-11-606.

DOI:10.1186/1471-2105-11-606
PMID:21176134
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3019227/
Abstract

BACKGROUND

Most genomic data have ultra-high dimensions with more than 10,000 genes (probes). Regularization methods with L₁ and L(p) penalty have been extensively studied in survival analysis with high-dimensional genomic data. However, when the sample size n << m (the number of genes), directly identifying a small subset of genes from ultra-high (m > 10, 000) dimensional data is time-consuming and not computationally efficient. In current microarray analysis, what people really do is select a couple of thousands (or hundreds) of genes using univariate analysis or statistical tests, and then apply the LASSO-type penalty to further reduce the number of disease associated genes. This two-step procedure may introduce bias and inaccuracy and lead us to miss biologically important genes.

RESULTS

The accelerated failure time (AFT) model is a linear regression model and a useful alternative to the Cox model for survival analysis. In this paper, we propose a nonlinear kernel based AFT model and an efficient variable selection method with adaptive kernel ridge regression. Our proposed variable selection method is based on the kernel matrix and dual problem with a much smaller n x n matrix. It is very efficient when the number of unknown variables (genes) is much larger than the number of samples. Moreover, the primal variables are explicitly updated and the sparsity in the solution is exploited.

CONCLUSIONS

Our proposed methods can simultaneously identify survival associated prognostic factors and predict survival outcomes with ultra-high dimensional genomic data. We have demonstrated the performance of our methods with both simulation and real data. The proposed method performs superbly with limited computational studies.

摘要

背景

大多数基因组数据具有超过 10000 个基因(探针)的超高维特性。在高维基因组数据的生存分析中,已经广泛研究了具有 L₁和 L(p)惩罚的正则化方法。然而,当样本量 n << m(基因数量)时,直接从超高维(m>10000)数据中识别一小部分基因是非常耗时的,并且计算效率不高。在当前的微阵列分析中,人们真正做的是使用单变量分析或统计检验选择几千个(或几百个)基因,然后应用 LASSO 型惩罚进一步减少与疾病相关的基因数量。这种两步程序可能会引入偏差和不准确性,导致我们错过生物学上重要的基因。

结果

加速失效时间(AFT)模型是一种线性回归模型,是生存分析中 Cox 模型的有用替代方法。在本文中,我们提出了一种基于非线性核的 AFT 模型和一种基于自适应核岭回归的高效变量选择方法。我们提出的变量选择方法基于核矩阵和对偶问题,使用的 n x n 矩阵要小得多。当未知变量(基因)的数量远大于样本数量时,它的效率非常高。此外,还显式更新了主变量,并利用了解中的稀疏性。

结论

我们提出的方法可以同时识别与生存相关的预后因素,并利用超高维基因组数据预测生存结果。我们已经通过模拟和真实数据验证了我们方法的性能。该方法在有限的计算研究中表现出色。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/13e1/3019227/2b1216248cbf/1471-2105-11-606-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/13e1/3019227/e76c733b536f/1471-2105-11-606-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/13e1/3019227/2b1216248cbf/1471-2105-11-606-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/13e1/3019227/e76c733b536f/1471-2105-11-606-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/13e1/3019227/2b1216248cbf/1471-2105-11-606-2.jpg

相似文献

1
Kernel based methods for accelerated failure time model with ultra-high dimensional data.基于核的超高维数据加速失效时间模型方法。
BMC Bioinformatics. 2010 Dec 21;11:606. doi: 10.1186/1471-2105-11-606.
2
The L(1/2) regularization approach for survival analysis in the accelerated failure time model.L(1/2)正则化方法在加速失效时间模型中的生存分析。
Comput Biol Med. 2015 Sep;64:283-90. doi: 10.1016/j.compbiomed.2014.09.002. Epub 2014 Sep 18.
3
Robust sparse accelerated failure time model for survival analysis.用于生存分析的稳健稀疏加速失效时间模型。
Technol Health Care. 2018;26(S1):55-63. doi: 10.3233/THC-174141.
4
Systematic benchmarking of microarray data classification: assessing the role of non-linearity and dimensionality reduction.微阵列数据分类的系统基准测试:评估非线性和降维的作用。
Bioinformatics. 2004 Nov 22;20(17):3185-95. doi: 10.1093/bioinformatics/bth383. Epub 2004 Jul 1.
5
Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data.高维小样本情况下的惩罚Cox回归分析及其在微阵列基因表达数据中的应用
Bioinformatics. 2005 Jul 1;21(13):3001-8. doi: 10.1093/bioinformatics/bti422. Epub 2005 Apr 6.
6
Doubly penalized buckley-james method for survival data with high-dimensional covariates.用于具有高维协变量生存数据的双重惩罚Buckley-James方法
Biometrics. 2008 Mar;64(1):132-40. doi: 10.1111/j.1541-0420.2007.00877.x. Epub 2007 Aug 3.
7
Cancer survival analysis using semi-supervised learning method based on Cox and AFT models with L1/2 regularization.基于带有L1/2正则化的Cox模型和加速失效时间(AFT)模型的半监督学习方法进行癌症生存分析。
BMC Med Genomics. 2016 Mar 1;9:11. doi: 10.1186/s12920-016-0169-6.
8
Bayesian variable selection for the analysis of microarray data with censored outcomes.用于分析具有删失结局的微阵列数据的贝叶斯变量选择
Bioinformatics. 2006 Sep 15;22(18):2262-8. doi: 10.1093/bioinformatics/btl362. Epub 2006 Jul 15.
9
Survival analysis with high-dimensional covariates: an application in microarray studies.具有高维协变量的生存分析:在微阵列研究中的应用。
Stat Appl Genet Mol Biol. 2009;8(1):Article 14. doi: 10.2202/1544-6115.1423. Epub 2009 Feb 11.
10
A regression-based differential expression detection algorithm for microarray studies with ultra-low sample size.一种用于超低样本量微阵列研究的基于回归的差异表达检测算法。
PLoS One. 2015 Mar 4;10(3):e0118198. doi: 10.1371/journal.pone.0118198. eCollection 2015.

引用本文的文献

1
Pathway aggregation for survival prediction via multiple kernel learning.通过多内核学习进行生存预测的途径聚合。
Stat Med. 2018 Jul 20;37(16):2501-2515. doi: 10.1002/sim.7681. Epub 2018 Apr 17.
2
Multilevel regularized regression for simultaneous taxa selection and network construction with metagenomic count data.用于宏基因组计数数据的同时进行分类群选择和网络构建的多级正则化回归
Bioinformatics. 2015 Apr 1;31(7):1067-74. doi: 10.1093/bioinformatics/btu778. Epub 2014 Nov 20.
3
Omnibus risk assessment via accelerated failure time kernel machine modeling.

本文引用的文献

1
Gene identification and survival prediction with Lp Cox regression and novel similarity measure.
Int J Data Min Bioinform. 2009;3(4):398-408. doi: 10.1504/ijdmb.2009.029203.
2
Survival prediction and gene identification with penalized global AUC maximization.基于惩罚全局AUC最大化的生存预测与基因识别
J Comput Biol. 2009 Dec;16(12):1661-70. doi: 10.1089/cmb.2008.0188.
3
Additive risk survival model with microarray data.具有微阵列数据的相加风险生存模型。
BMC Bioinformatics. 2007 Jun 8;8:192. doi: 10.1186/1471-2105-8-192.
通过加速失效时间核机器建模进行综合风险评估。
Biometrics. 2013 Dec;69(4):861-73. doi: 10.1111/biom.12098. Epub 2013 Nov 6.
4
Bayesian variable selection for the analysis of microarray data with censored outcomes.用于分析具有删失结局的微阵列数据的贝叶斯变量选择
Bioinformatics. 2006 Sep 15;22(18):2262-8. doi: 10.1093/bioinformatics/btl362. Epub 2006 Jul 15.
5
Cross-validated Cox regression on microarray gene expression data.微阵列基因表达数据的交叉验证Cox回归分析
Stat Med. 2006 Sep 30;25(18):3201-16. doi: 10.1002/sim.2353.
6
Prediction of survival in follicular lymphoma based on molecular features of tumor-infiltrating immune cells.基于肿瘤浸润免疫细胞分子特征预测滤泡性淋巴瘤的生存率
N Engl J Med. 2004 Nov 18;351(21):2159-69. doi: 10.1056/NEJMoa041869.
7
The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma.利用分子谱分析预测弥漫性大B细胞淋巴瘤化疗后的生存率。
N Engl J Med. 2002 Jun 20;346(25):1937-47. doi: 10.1056/NEJMoa012914.
8
Bayesian accelerated failure time analysis with application to veterinary epidemiology.贝叶斯加速失效时间分析及其在兽医流行病学中的应用。
Stat Med. 2000 Jan 30;19(2):221-37. doi: 10.1002/(sici)1097-0258(20000130)19:2<221::aid-sim328>3.0.co;2-c.
9
The accelerated failure time model: a useful alternative to the Cox regression model in survival analysis.加速失效时间模型:生存分析中Cox回归模型的一种有用替代方法。
Stat Med. 1992 Oct-Nov;11(14-15):1871-9. doi: 10.1002/sim.4780111409.