Suppr超能文献

高维异质医学数据的生存分析:探索特征提取作为特征选择的替代方法。

Survival analysis for high-dimensional, heterogeneous medical data: Exploring feature extraction as an alternative to feature selection.

作者信息

Pölsterl Sebastian, Conjeti Sailesh, Navab Nassir, Katouzian Amin

机构信息

Computer Aided Medical Procedures, Technische Universität München, Boltzmannstraße 3, 85748 Garching bei München, Germany.

Computer Aided Medical Procedures, Technische Universität München, Boltzmannstraße 3, 85748 Garching bei München, Germany; Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218, USA.

出版信息

Artif Intell Med. 2016 Sep;72:1-11. doi: 10.1016/j.artmed.2016.07.004. Epub 2016 Jul 29.

Abstract

BACKGROUND

In clinical research, the primary interest is often the time until occurrence of an adverse event, i.e., survival analysis. Its application to electronic health records is challenging for two main reasons: (1) patient records are comprised of high-dimensional feature vectors, and (2) feature vectors are a mix of categorical and real-valued features, which implies varying statistical properties among features. To learn from high-dimensional data, researchers can choose from a wide range of methods in the fields of feature selection and feature extraction. Whereas feature selection is well studied, little work focused on utilizing feature extraction techniques for survival analysis.

RESULTS

We investigate how well feature extraction methods can deal with features having varying statistical properties. In particular, we consider multiview spectral embedding algorithms, which specifically have been developed for these situations. We propose to use random survival forests to accurately determine local neighborhood relations from right censored survival data. We evaluated 10 combinations of feature extraction methods and 6 survival models with and without intrinsic feature selection in the context of survival analysis on 3 clinical datasets. Our results demonstrate that for small sample sizes - less than 500 patients - models with built-in feature selection (Cox model with ℓ1 penalty, random survival forest, and gradient boosted models) outperform feature extraction methods by a median margin of 6.3% in concordance index (inter-quartile range: [-1.2%;14.6%]).

CONCLUSIONS

If the number of samples is insufficient, feature extraction methods are unable to reliably identify the underlying manifold, which makes them of limited use in these situations. For large sample sizes - in our experiments, 2500 samples or more - feature extraction methods perform as well as feature selection methods.

摘要

背景

在临床研究中,主要关注的往往是不良事件发生前的时间,即生存分析。将其应用于电子健康记录具有挑战性,主要有两个原因:(1)患者记录由高维特征向量组成;(2)特征向量是分类特征和实值特征的混合,这意味着特征之间的统计特性各不相同。为了从高维数据中学习,研究人员可以在特征选择和特征提取领域选择多种方法。虽然特征选择已得到充分研究,但很少有工作专注于将特征提取技术用于生存分析。

结果

我们研究了特征提取方法处理具有不同统计特性的特征的能力。具体而言,我们考虑了多视图谱嵌入算法,该算法专门针对这些情况开发。我们建议使用随机生存森林从右删失生存数据中准确确定局部邻域关系。在3个临床数据集的生存分析背景下,我们评估了10种特征提取方法与6种生存模型的组合,包括有无内在特征选择的情况。我们的结果表明,对于小样本量(少于500名患者),具有内置特征选择的模型(带ℓ1惩罚的Cox模型、随机生存森林和梯度提升模型)在一致性指数方面比特征提取方法表现更优,中位数优势为6.3%(四分位间距:[-1.2%;14.6%])。

结论

如果样本数量不足,特征提取方法无法可靠地识别潜在流形,这使得它们在这些情况下用途有限。对于大样本量(在我们的实验中为2500个样本或更多),特征提取方法的表现与特征选择方法相当。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验