• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于高维多项分类并应用于癌症亚型预测的网络约束组套索法

Network-constrained group lasso for high-dimensional multinomial classification with application to cancer subtype prediction.

作者信息

Tian Xinyu, Wang Xuefeng, Chen Jun

机构信息

Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY, USA.

Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY, USA. ; Department of Preventive Medicine, Stony Brook University, Stony Brook, NY, USA.

出版信息

Cancer Inform. 2015 Jan 12;13(Suppl 6):25-33. doi: 10.4137/CIN.S17686. eCollection 2014.

DOI:10.4137/CIN.S17686
PMID:25635165
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4295837/
Abstract

Classic multinomial logit model, commonly used in multiclass regression problem, is restricted to few predictors and does not take into account the relationship among variables. It has limited use for genomic data, where the number of genomic features far exceeds the sample size. Genomic features such as gene expressions are usually related by an underlying biological network. Efficient use of the network information is important to improve classification performance as well as the biological interpretability. We proposed a multinomial logit model that is capable of addressing both the high dimensionality of predictors and the underlying network information. Group lasso was used to induce model sparsity, and a network-constraint was imposed to induce the smoothness of the coefficients with respect to the underlying network structure. To deal with the non-smoothness of the objective function in optimization, we developed a proximal gradient algorithm for efficient computation. The proposed model was compared to models with no prior structure information in both simulations and a problem of cancer subtype prediction with real TCGA (the cancer genome atlas) gene expression data. The network-constrained mode outperformed the traditional ones in both cases.

摘要

经典多项逻辑回归模型常用于多分类回归问题,它受限于预测变量较少,且未考虑变量之间的关系。对于基因组数据,其用途有限,因为基因组特征的数量远远超过样本量。诸如基因表达等基因组特征通常通过潜在的生物网络相互关联。有效利用网络信息对于提高分类性能以及生物学可解释性都很重要。我们提出了一种能够处理预测变量的高维度以及潜在网络信息的多项逻辑回归模型。使用组套索来诱导模型稀疏性,并施加网络约束以诱导系数相对于潜在网络结构的平滑性。为了处理优化中目标函数的非光滑性,我们开发了一种近端梯度算法以进行高效计算。在模拟以及使用真实的TCGA(癌症基因组图谱)基因表达数据进行癌症亚型预测的问题中,将所提出的模型与没有先验结构信息的模型进行了比较。在这两种情况下,网络约束模型均优于传统模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3e67/4295837/abc334695ffc/cin-suppl.6-2014-025f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3e67/4295837/c2b9c52f10bd/cin-suppl.6-2014-025f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3e67/4295837/380a15e5d45a/cin-suppl.6-2014-025f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3e67/4295837/56b6996c06cd/cin-suppl.6-2014-025f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3e67/4295837/3f03b33df28e/cin-suppl.6-2014-025f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3e67/4295837/abc334695ffc/cin-suppl.6-2014-025f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3e67/4295837/c2b9c52f10bd/cin-suppl.6-2014-025f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3e67/4295837/380a15e5d45a/cin-suppl.6-2014-025f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3e67/4295837/56b6996c06cd/cin-suppl.6-2014-025f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3e67/4295837/3f03b33df28e/cin-suppl.6-2014-025f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3e67/4295837/abc334695ffc/cin-suppl.6-2014-025f5.jpg

相似文献

1
Network-constrained group lasso for high-dimensional multinomial classification with application to cancer subtype prediction.用于高维多项分类并应用于癌症亚型预测的网络约束组套索法
Cancer Inform. 2015 Jan 12;13(Suppl 6):25-33. doi: 10.4137/CIN.S17686. eCollection 2014.
2
NETWORK-REGULARIZED HIGH-DIMENSIONAL COX REGRESSION FOR ANALYSIS OF GENOMIC DATA.用于基因组数据分析的网络正则化高维Cox回归
Stat Sin. 2014 Jul;24(3):1433-1459. doi: 10.5705/ss.2012.317.
3
Building interpretable predictive models for pediatric hospital readmission using Tree-Lasso logistic regression.使用树套索逻辑回归构建用于儿科医院再入院的可解释预测模型。
Artif Intell Med. 2016 Sep;72:12-21. doi: 10.1016/j.artmed.2016.07.003. Epub 2016 Jul 29.
4
Multinomial logistic regression ensembles.多项逻辑回归集成
J Biopharm Stat. 2013 May;23(3):681-94. doi: 10.1080/10543406.2012.756500.
5
Fused lasso algorithm for Cox' proportional hazards and binomial logit models with application to copy number profiles.用于Cox比例风险模型和二项逻辑模型的融合套索算法及其在拷贝数谱中的应用。
Biom J. 2014 May;56(3):477-92. doi: 10.1002/bimj.201200241. Epub 2014 Feb 3.
6
A Multi-way Multi-task Learning Approach for Multinomial Logistic Regression*. An Application in Joint Prediction of Appointment Miss-opportunities across Multiple Clinics.一种用于多项式逻辑回归的多路多任务学习方法*。在多个诊所预约错失机会联合预测中的应用。
Methods Inf Med. 2017 Aug 11;56(4):294-307. doi: 10.3414/ME16-01-0112. Epub 2017 Jun 7.
7
A novel network and sparsity constraint regression model for functional module identification in genomic data analysis.一种用于基因组数据分析中功能模块识别的新型网络与稀疏约束回归模型。
Int J Data Min Bioinform. 2013;8(3):311-25. doi: 10.1504/ijdmb.2013.056081.
8
Accounting for grouped predictor variables or pathways in high-dimensional penalized Cox regression models.在高维惩罚 Cox 回归模型中考虑分组预测变量或途径。
BMC Bioinformatics. 2020 Jul 2;21(1):277. doi: 10.1186/s12859-020-03618-y.
9
Stabilizing l1-norm prediction models by supervised feature grouping.通过监督特征分组来稳定l1范数预测模型。
J Biomed Inform. 2016 Feb;59:149-68. doi: 10.1016/j.jbi.2015.11.012. Epub 2015 Dec 9.
10
A Bayesian Approach for Graph-constrained Estimation for High-dimensional Regression.一种用于高维回归的图约束估计的贝叶斯方法。
Int J Syst Synth Biol. 2010;1(2):255-272.

引用本文的文献

1
Exploring the role of health-related quality of life measures in predictive modelling for oncology: a systematic review.探索健康相关生活质量测量在肿瘤学预测模型中的作用:一项系统综述
Qual Life Res. 2025 Feb;34(2):305-323. doi: 10.1007/s11136-024-03820-y. Epub 2024 Dec 9.
2
Multicategory Survival Outcomes Classification via Overlapping Group Screening Process Based on Multinomial Logistic Regression Model With Application to TCGA Transcriptomic Data.基于多项逻辑回归模型并应用于TCGA转录组数据的重叠组筛选过程的多类别生存结果分类
Cancer Inform. 2024 Oct 8;23:11769351241286710. doi: 10.1177/11769351241286710. eCollection 2024.
3

本文引用的文献

1
Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors.具有分组预测变量的非凸惩罚线性和逻辑回归模型的分组下降算法。
Stat Comput. 2015 Mar;25(2):173-187. doi: 10.1007/s11222-013-9424-2.
2
A network-assisted co-clustering algorithm to discover cancer subtypes based on gene expression.基于基因表达的网络辅助协同聚类算法发现癌症亚型。
BMC Bioinformatics. 2014 Feb 4;15:37. doi: 10.1186/1471-2105-15-37.
3
VARIABLE SELECTION FOR SPARSE DIRICHLET-MULTINOMIAL REGRESSION WITH AN APPLICATION TO MICROBIOME DATA ANALYSIS.
Sparse spectral graph analysis and its application to gastric cancer drug resistance-specific molecular interplays identification.
稀疏谱图分析及其在胃癌耐药特异性分子相互作用识别中的应用。
PLoS One. 2024 Jul 5;19(7):e0305386. doi: 10.1371/journal.pone.0305386. eCollection 2024.
4
Knowledge-Guided Statistical Learning Methods for Analysis of High-Dimensional -Omics Data in Precision Oncology.用于精准肿瘤学中高维组学数据分析的知识引导统计学习方法
JCO Precis Oncol. 2019 Oct 24;3. doi: 10.1200/PO.19.00018. eCollection 2019 Oct.
5
Computational Modeling of Gene-Specific Transcriptional Repression, Activation and Chromatin Interactions in Leukemogenesis by LASSO-Regularized Logistic Regression.通过 LASSO 正则化逻辑回归对白血病发生过程中基因特异性转录抑制、激活和染色质相互作用的计算建模。
IEEE/ACM Trans Comput Biol Bioinform. 2021 Nov-Dec;18(6):2109-2122. doi: 10.1109/TCBB.2021.3078128. Epub 2021 Dec 8.
6
A method for subtype analysis with somatic mutations.一种利用体细胞突变进行亚型分析的方法。
Bioinformatics. 2021 Apr 9;37(1):50-56. doi: 10.1093/bioinformatics/btaa1090.
7
Robust network-based regularization and variable selection for high-dimensional genomic data in cancer prognosis.用于癌症预后高维基因组数据的基于网络的稳健正则化和变量选择
Genet Epidemiol. 2019 Apr;43(3):276-291. doi: 10.1002/gepi.22194. Epub 2019 Feb 11.
8
glmgraph: an R package for variable selection and predictive modeling of structured genomic data.glmgraph:一个用于结构化基因组数据变量选择和预测建模的R包。
Bioinformatics. 2015 Dec 15;31(24):3991-3. doi: 10.1093/bioinformatics/btv497. Epub 2015 Aug 26.
用于稀疏狄利克雷-多项回归的变量选择及其在微生物组数据分析中的应用
Ann Appl Stat. 2013 Mar 1;7(1). doi: 10.1214/12-AOAS592.
4
Sparse logistic regression with a L1/2 penalty for gene selection in cancer classification.基于 L1/2 罚项的稀疏逻辑回归在癌症分类中的基因选择。
BMC Bioinformatics. 2013 Jun 19;14:198. doi: 10.1186/1471-2105-14-198.
5
Structure-constrained sparse canonical correlation analysis with an application to microbiome data analysis.基于结构约束的稀疏典型相关分析及其在微生物组数据分析中的应用。
Biostatistics. 2013 Apr;14(2):244-58. doi: 10.1093/biostatistics/kxs038. Epub 2012 Oct 15.
6
Associating microbiome composition with environmental covariates using generalized UniFrac distances.使用广义 UniFrac 距离将微生物组组成与环境协变量相关联。
Bioinformatics. 2012 Aug 15;28(16):2106-13. doi: 10.1093/bioinformatics/bts342. Epub 2012 Jun 17.
7
Identifying dysregulated pathways in cancers from pathway interaction networks.从通路相互作用网络中鉴定癌症中的失调通路。
BMC Bioinformatics. 2012 Jun 7;13:126. doi: 10.1186/1471-2105-13-126.
8
Prognostic gene signatures for patient stratification in breast cancer: accuracy, stability and interpretability of gene selection approaches using prior knowledge on protein-protein interactions.用于乳腺癌患者分层的预后基因特征:利用蛋白质 - 蛋白质相互作用的先验知识选择基因的方法的准确性、稳定性和可解释性。
BMC Bioinformatics. 2012 May 1;13:69. doi: 10.1186/1471-2105-13-69.
9
MicroRNA dysregulation in cancer: diagnostics, monitoring and therapeutics. A comprehensive review.miRNA 在癌症中的失调:诊断、监测和治疗。全面综述。
EMBO Mol Med. 2012 Mar;4(3):143-59. doi: 10.1002/emmm.201100209. Epub 2012 Feb 20.
10
The Sparse Laplacian Shrinkage Estimator for High-Dimensional Regression.用于高维回归的稀疏拉普拉斯收缩估计器
Ann Stat. 2011;39(4):2021-2046. doi: 10.1214/11-aos897.