Suppr超能文献

基于结构的生存数据分析中的变量选择。

Structure-based variable selection for survival data.

机构信息

Institute of Computer Science, Foundation for Research and Technology-Hellas (FORTH) and Computer Science Department, University of Crete, Heraklion, Greece.

出版信息

Bioinformatics. 2010 Aug 1;26(15):1887-94. doi: 10.1093/bioinformatics/btq261. Epub 2010 Jun 2.

Abstract

MOTIVATION

Variable selection is a typical approach used for molecular-signature and biomarker discovery; however, its application to survival data is often complicated by censored samples. We propose a new algorithm for variable selection suitable for the analysis of high-dimensional, right-censored data called Survival Max-Min Parents and Children (SMMPC). The algorithm is conceptually simple, scalable, based on the theory of Bayesian networks (BNs) and the Markov blanket and extends the corresponding algorithm (MMPC) for classification tasks. The selected variables have a structural interpretation: if T is the survival time (in general the time-to-event), SMMPC returns the variables adjacent to T in the BN representing the data distribution. The selected variables also have a causal interpretation that we discuss.

RESULTS

We conduct an extensive empirical analysis of prototypical and state-of-the-art variable selection algorithms for survival data that are applicable to high-dimensional biological data. SMMPC selects on average the smallest variable subsets (less than a dozen per dataset), while statistically significantly outperforming all of the methods in the study returning a manageable number of genes that could be inspected by a human expert.

AVAILABILITY

Matlab and R code are freely available from http://www.mensxmachina.org

摘要

动机

变量选择是一种常用于分子特征和生物标志物发现的典型方法;然而,它在生存数据中的应用通常因删失样本而变得复杂。我们提出了一种新的适用于高维右删失数据分析的变量选择算法,称为生存最大最小双亲与子女(SMMPC)。该算法概念简单,可扩展,基于贝叶斯网络(BNs)和马克斯毯理论,并扩展了相应的分类任务算法(MMPC)。所选变量具有结构解释:如果 T 是生存时间(通常是事件发生时间),SMMPC 将返回 BN 中与 T 相邻的表示数据分布的变量。所选变量也具有因果解释,我们将对此进行讨论。

结果

我们对适用于高维生物数据的生存数据的典型和最先进的变量选择算法进行了广泛的实证分析。SMMPC 平均选择最小的变量子集(每个数据集不到十几个),而在统计学上显著优于研究中的所有方法,返回了数量可控的基因,这些基因可以由人类专家进行检查。

可用性

Matlab 和 R 代码可从 http://www.mensxmachina.org 免费获得。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验