• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

高维异质医学数据的生存分析:探索特征提取作为特征选择的替代方法。

Survival analysis for high-dimensional, heterogeneous medical data: Exploring feature extraction as an alternative to feature selection.

作者信息

Pölsterl Sebastian, Conjeti Sailesh, Navab Nassir, Katouzian Amin

机构信息

Computer Aided Medical Procedures, Technische Universität München, Boltzmannstraße 3, 85748 Garching bei München, Germany.

Computer Aided Medical Procedures, Technische Universität München, Boltzmannstraße 3, 85748 Garching bei München, Germany; Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218, USA.

出版信息

Artif Intell Med. 2016 Sep;72:1-11. doi: 10.1016/j.artmed.2016.07.004. Epub 2016 Jul 29.

DOI:10.1016/j.artmed.2016.07.004
PMID:27664504
Abstract

BACKGROUND

In clinical research, the primary interest is often the time until occurrence of an adverse event, i.e., survival analysis. Its application to electronic health records is challenging for two main reasons: (1) patient records are comprised of high-dimensional feature vectors, and (2) feature vectors are a mix of categorical and real-valued features, which implies varying statistical properties among features. To learn from high-dimensional data, researchers can choose from a wide range of methods in the fields of feature selection and feature extraction. Whereas feature selection is well studied, little work focused on utilizing feature extraction techniques for survival analysis.

RESULTS

We investigate how well feature extraction methods can deal with features having varying statistical properties. In particular, we consider multiview spectral embedding algorithms, which specifically have been developed for these situations. We propose to use random survival forests to accurately determine local neighborhood relations from right censored survival data. We evaluated 10 combinations of feature extraction methods and 6 survival models with and without intrinsic feature selection in the context of survival analysis on 3 clinical datasets. Our results demonstrate that for small sample sizes - less than 500 patients - models with built-in feature selection (Cox model with ℓ1 penalty, random survival forest, and gradient boosted models) outperform feature extraction methods by a median margin of 6.3% in concordance index (inter-quartile range: [-1.2%;14.6%]).

CONCLUSIONS

If the number of samples is insufficient, feature extraction methods are unable to reliably identify the underlying manifold, which makes them of limited use in these situations. For large sample sizes - in our experiments, 2500 samples or more - feature extraction methods perform as well as feature selection methods.

摘要

背景

在临床研究中,主要关注的往往是不良事件发生前的时间,即生存分析。将其应用于电子健康记录具有挑战性,主要有两个原因:(1)患者记录由高维特征向量组成;(2)特征向量是分类特征和实值特征的混合,这意味着特征之间的统计特性各不相同。为了从高维数据中学习,研究人员可以在特征选择和特征提取领域选择多种方法。虽然特征选择已得到充分研究,但很少有工作专注于将特征提取技术用于生存分析。

结果

我们研究了特征提取方法处理具有不同统计特性的特征的能力。具体而言,我们考虑了多视图谱嵌入算法,该算法专门针对这些情况开发。我们建议使用随机生存森林从右删失生存数据中准确确定局部邻域关系。在3个临床数据集的生存分析背景下,我们评估了10种特征提取方法与6种生存模型的组合,包括有无内在特征选择的情况。我们的结果表明,对于小样本量(少于500名患者),具有内置特征选择的模型(带ℓ1惩罚的Cox模型、随机生存森林和梯度提升模型)在一致性指数方面比特征提取方法表现更优,中位数优势为6.3%(四分位间距:[-1.2%;14.6%])。

结论

如果样本数量不足,特征提取方法无法可靠地识别潜在流形,这使得它们在这些情况下用途有限。对于大样本量(在我们的实验中为2500个样本或更多),特征提取方法的表现与特征选择方法相当。

相似文献

1
Survival analysis for high-dimensional, heterogeneous medical data: Exploring feature extraction as an alternative to feature selection.高维异质医学数据的生存分析:探索特征提取作为特征选择的替代方法。
Artif Intell Med. 2016 Sep;72:1-11. doi: 10.1016/j.artmed.2016.07.004. Epub 2016 Jul 29.
2
Stable feature selection for clinical prediction: exploiting ICD tree structure using Tree-Lasso.用于临床预测的稳定特征选择:利用树套索法挖掘国际疾病分类树结构
J Biomed Inform. 2015 Feb;53:277-90. doi: 10.1016/j.jbi.2014.11.013. Epub 2014 Dec 9.
3
Stabilizing l1-norm prediction models by supervised feature grouping.通过监督特征分组来稳定l1范数预测模型。
J Biomed Inform. 2016 Feb;59:149-68. doi: 10.1016/j.jbi.2015.11.012. Epub 2015 Dec 9.
4
Support Vector Feature Selection for Early Detection of Anastomosis Leakage From Bag-of-Words in Electronic Health Records.基于电子健康记录中词袋模型的支持向量特征选择用于早期检测吻合口漏
IEEE J Biomed Health Inform. 2016 Sep;20(5):1404-15. doi: 10.1109/JBHI.2014.2361688. Epub 2014 Oct 8.
5
Toward better public health reporting using existing off the shelf approaches: The value of medical dictionaries in automated cancer detection using plaintext medical data.利用现有现成方法实现更好的公共卫生报告:医学词典在使用纯文本医学数据进行自动癌症检测中的价值。
J Biomed Inform. 2017 May;69:160-176. doi: 10.1016/j.jbi.2017.04.008. Epub 2017 Apr 12.
6
The feature selection bias problem in relation to high-dimensional gene data.与高维基因数据相关的特征选择偏差问题。
Artif Intell Med. 2016 Jan;66:63-71. doi: 10.1016/j.artmed.2015.11.001. Epub 2015 Nov 14.
7
A review on advancements in feature selection and feature extraction for high-dimensional NGS data analysis.一篇关于高通量测序数据分析中特征选择和特征提取进展的综述。
Funct Integr Genomics. 2024 Aug 19;24(5):139. doi: 10.1007/s10142-024-01415-x.
8
Survival Prediction and Feature Selection in Patients with Breast Cancer Using Support Vector Regression.使用支持向量回归对乳腺癌患者进行生存预测和特征选择
Comput Math Methods Med. 2016;2016:2157984. doi: 10.1155/2016/2157984. Epub 2016 Nov 1.
9
A PCA aided cross-covariance scheme for discriminative feature extraction from EEG signals.基于主成分分析的脑电信号判别特征提取的互协方差方法。
Comput Methods Programs Biomed. 2017 Jul;146:47-57. doi: 10.1016/j.cmpb.2017.05.009. Epub 2017 May 24.
10
A machine learning-based framework to identify type 2 diabetes through electronic health records.一种基于机器学习的通过电子健康记录识别2型糖尿病的框架。
Int J Med Inform. 2017 Jan;97:120-127. doi: 10.1016/j.ijmedinf.2016.09.014. Epub 2016 Oct 1.

引用本文的文献

1
Developing clinical prognostic models to predict graft survival after renal transplantation: comparison of statistical and machine learning models.开发临床预后模型以预测肾移植后的移植物存活:统计模型与机器学习模型的比较
BMC Med Inform Decis Mak. 2025 Feb 3;25(1):54. doi: 10.1186/s12911-025-02906-y.
2
Machine Learning for Precision Epilepsy Surgery.用于精准癫痫手术的机器学习
Epilepsy Curr. 2023 Jan 18;23(2):78-83. doi: 10.1177/15357597221150055. eCollection 2023 Mar-Apr.
3
Evaluation of a decided sample size in machine learning applications.
机器学习应用中确定样本量的评估。
BMC Bioinformatics. 2023 Feb 14;24(1):48. doi: 10.1186/s12859-023-05156-9.
4
RbQE: An Efficient Method for Content-Based Medical Image Retrieval Based on Query Expansion.RbQE:一种基于查询扩展的高效基于内容的医学图像检索方法。
J Digit Imaging. 2023 Jun;36(3):1248-1261. doi: 10.1007/s10278-022-00769-7. Epub 2023 Jan 26.
5
A Complete Process of Text Classification System Using State-of-the-Art NLP Models.使用最先进的自然语言处理模型的文本分类系统的完整流程。
Comput Intell Neurosci. 2022 Jun 9;2022:1883698. doi: 10.1155/2022/1883698. eCollection 2022.
6
Computational Analysis of High-Dimensional DNA Methylation Data for Cancer Prognosis.计算分析高维 DNA 甲基化数据在癌症预后中的应用。
J Comput Biol. 2022 Aug;29(8):769-781. doi: 10.1089/cmb.2022.0002. Epub 2022 Jun 6.
7
Machine learning for optimized individual survival prediction in resectable upper gastrointestinal cancer.机器学习在可切除上消化道癌症个体化生存预测中的应用。
J Cancer Res Clin Oncol. 2023 May;149(5):1691-1702. doi: 10.1007/s00432-022-04063-5. Epub 2022 May 26.
8
Body fat prediction through feature extraction based on anthropometric and laboratory measurements.基于人体测量学和实验室测量的特征提取进行体脂预测。
PLoS One. 2022 Feb 22;17(2):e0263333. doi: 10.1371/journal.pone.0263333. eCollection 2022.
9
Application of Feature Extraction Methods for Chemical Risk Classification in the Pharmaceutical Industry.特征提取方法在制药行业化学风险分类中的应用。
Sensors (Basel). 2021 Aug 26;21(17):5753. doi: 10.3390/s21175753.
10
SurvNet: A Novel Deep Neural Network for Lung Cancer Survival Analysis With Missing Values.SurvNet:一种用于缺失值肺癌生存分析的新型深度神经网络。
Front Oncol. 2021 Jan 20;10:588990. doi: 10.3389/fonc.2020.588990. eCollection 2020.