• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于结构的生存数据分析中的变量选择。

Structure-based variable selection for survival data.

机构信息

Institute of Computer Science, Foundation for Research and Technology-Hellas (FORTH) and Computer Science Department, University of Crete, Heraklion, Greece.

出版信息

Bioinformatics. 2010 Aug 1;26(15):1887-94. doi: 10.1093/bioinformatics/btq261. Epub 2010 Jun 2.

DOI:10.1093/bioinformatics/btq261
PMID:20519286
Abstract

MOTIVATION

Variable selection is a typical approach used for molecular-signature and biomarker discovery; however, its application to survival data is often complicated by censored samples. We propose a new algorithm for variable selection suitable for the analysis of high-dimensional, right-censored data called Survival Max-Min Parents and Children (SMMPC). The algorithm is conceptually simple, scalable, based on the theory of Bayesian networks (BNs) and the Markov blanket and extends the corresponding algorithm (MMPC) for classification tasks. The selected variables have a structural interpretation: if T is the survival time (in general the time-to-event), SMMPC returns the variables adjacent to T in the BN representing the data distribution. The selected variables also have a causal interpretation that we discuss.

RESULTS

We conduct an extensive empirical analysis of prototypical and state-of-the-art variable selection algorithms for survival data that are applicable to high-dimensional biological data. SMMPC selects on average the smallest variable subsets (less than a dozen per dataset), while statistically significantly outperforming all of the methods in the study returning a manageable number of genes that could be inspected by a human expert.

AVAILABILITY

Matlab and R code are freely available from http://www.mensxmachina.org

摘要

动机

变量选择是一种常用于分子特征和生物标志物发现的典型方法;然而,它在生存数据中的应用通常因删失样本而变得复杂。我们提出了一种新的适用于高维右删失数据分析的变量选择算法,称为生存最大最小双亲与子女(SMMPC)。该算法概念简单,可扩展,基于贝叶斯网络(BNs)和马克斯毯理论,并扩展了相应的分类任务算法(MMPC)。所选变量具有结构解释:如果 T 是生存时间(通常是事件发生时间),SMMPC 将返回 BN 中与 T 相邻的表示数据分布的变量。所选变量也具有因果解释,我们将对此进行讨论。

结果

我们对适用于高维生物数据的生存数据的典型和最先进的变量选择算法进行了广泛的实证分析。SMMPC 平均选择最小的变量子集(每个数据集不到十几个),而在统计学上显著优于研究中的所有方法,返回了数量可控的基因,这些基因可以由人类专家进行检查。

可用性

Matlab 和 R 代码可从 http://www.mensxmachina.org 免费获得。

相似文献

1
Structure-based variable selection for survival data.基于结构的生存数据分析中的变量选择。
Bioinformatics. 2010 Aug 1;26(15):1887-94. doi: 10.1093/bioinformatics/btq261. Epub 2010 Jun 2.
2
Bayesian variable selection for the analysis of microarray data with censored outcomes.用于分析具有删失结局的微阵列数据的贝叶斯变量选择
Bioinformatics. 2006 Sep 15;22(18):2262-8. doi: 10.1093/bioinformatics/btl362. Epub 2006 Jul 15.
3
Gene selection in cancer classification using sparse logistic regression with Bayesian regularization.使用带贝叶斯正则化的稀疏逻辑回归进行癌症分类中的基因选择。
Bioinformatics. 2006 Oct 1;22(19):2348-55. doi: 10.1093/bioinformatics/btl386. Epub 2006 Jul 14.
4
Incorporating expert knowledge when learning Bayesian network structure: a medical case study.在学习贝叶斯网络结构时纳入专家知识:一个医学案例研究。
Artif Intell Med. 2011 Nov;53(3):181-204. doi: 10.1016/j.artmed.2011.08.004. Epub 2011 Sep 29.
5
A novel algorithm for scalable and accurate Bayesian network learning.一种用于可扩展且准确的贝叶斯网络学习的新算法。
Stud Health Technol Inform. 2004;107(Pt 1):711-5.
6
Predicting survival from microarray data--a comparative study.从微阵列数据预测生存率——一项比较研究。
Bioinformatics. 2007 Aug 15;23(16):2080-7. doi: 10.1093/bioinformatics/btm305. Epub 2007 Jun 6.
7
Regulatory motif finding by logic regression.通过逻辑回归进行调控基序发现。
Bioinformatics. 2004 Nov 1;20(16):2799-811. doi: 10.1093/bioinformatics/bth333. Epub 2004 May 27.
8
Genetic algorithms for simultaneous variable and sample selection in metabonomics.代谢组学中同时进行变量和样本选择的遗传算法
Bioinformatics. 2009 Jan 1;25(1):112-8. doi: 10.1093/bioinformatics/btn586. Epub 2008 Nov 14.
9
Robust biomarker identification for cancer diagnosis with ensemble feature selection methods.基于集成特征选择方法的癌症诊断稳健生物标志物识别。
Bioinformatics. 2010 Feb 1;26(3):392-8. doi: 10.1093/bioinformatics/btp630. Epub 2009 Nov 25.
10
Predicting the graft survival for heart-lung transplantation patients: an integrated data mining methodology.预测心肺移植患者的移植物存活率:一种综合数据挖掘方法。
Int J Med Inform. 2009 Dec;78(12):e84-96. doi: 10.1016/j.ijmedinf.2009.04.007. Epub 2009 Jun 3.

引用本文的文献

1
Peeling back the many layers of competitive exclusion.拨开层层竞争排斥的迷雾。
Front Microbiol. 2024 Mar 21;15:1342887. doi: 10.3389/fmicb.2024.1342887. eCollection 2024.
2
A new method for clustered survival data: Estimation of treatment effect heterogeneity and variable selection.一种新的聚类生存数据分析方法:处理效应异质性估计和变量选择。
Biom J. 2024 Jan;66(1):e2200178. doi: 10.1002/bimj.202200178. Epub 2023 Dec 10.
3
Multiple predictively equivalent risk models for handling missing data at time of prediction: With an application in severe hypoglycemia risk prediction for type 2 diabetes.
用于预测时处理缺失数据的多个预测等效风险模型:在2型糖尿病严重低血糖风险预测中的应用
J Biomed Inform. 2020 Mar;103:103379. doi: 10.1016/j.jbi.2020.103379. Epub 2020 Jan 28.
4
A greedy feature selection algorithm for Big Data of high dimensionality.一种用于高维大数据的贪心特征选择算法。
Mach Learn. 2019;108(2):149-202. doi: 10.1007/s10994-018-5748-7. Epub 2018 Aug 7.
5
Feature selection for high-dimensional temporal data.高维时间数据的特征选择。
BMC Bioinformatics. 2018 Jan 23;19(1):17. doi: 10.1186/s12859-018-2023-7.
6
Survival Prediction and Feature Selection in Patients with Breast Cancer Using Support Vector Regression.使用支持向量回归对乳腺癌患者进行生存预测和特征选择
Comput Math Methods Med. 2016;2016:2157984. doi: 10.1155/2016/2157984. Epub 2016 Nov 1.
7
An Artificial Neural Network Stratifies the Risks of Reintervention and Mortality after Endovascular Aneurysm Repair; a Retrospective Observational study.人工神经网络对血管内动脉瘤修复术后再次干预风险和死亡率进行分层;一项回顾性观察研究。
PLoS One. 2015 Jul 15;10(7):e0129024. doi: 10.1371/journal.pone.0129024. eCollection 2015.
8
T-ReCS: stable selection of dynamically formed groups of features with application to prediction of clinical outcomes.T-ReCS:动态形成的特征组的稳定选择及其在临床结果预测中的应用
Pac Symp Biocomput. 2015;20:431-42.
9
Hidden treasures in "ancient" microarrays: gene-expression portrays biology and potential resistance pathways of major lung cancer subtypes and normal tissue.“古老”微阵列中的隐藏宝藏:基因表达描绘了主要肺癌亚型和正常组织的生物学和潜在耐药途径。
Front Oncol. 2014 Sep 29;4:251. doi: 10.3389/fonc.2014.00251. eCollection 2014.
10
Biomarker signature identification in "omics" data with multi-class outcome.多类结局“组学”数据中的生物标志物特征识别。
Comput Struct Biotechnol J. 2013 Jun 8;6:e201303004. doi: 10.5936/csbj.201303004. eCollection 2013.