Suppr超能文献

T-ReCS:动态形成的特征组的稳定选择及其在临床结果预测中的应用

T-ReCS: stable selection of dynamically formed groups of features with application to prediction of clinical outcomes.

作者信息

Huang Grace T, Tsamardinos Ioannis, Raghu Vineet, Kaminski Naftali, Benos Panayiotis V

机构信息

Department of Computational and Systems Biology, and Joint CMU-Pitt PhD Program in computational Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15260, USA.

出版信息

Pac Symp Biocomput. 2015;20:431-42.

Abstract

Feature selection is used extensively in biomedical research for biomarker identification and patient classification, both of which are essential steps in developing personalized medicine strategies. However, the structured nature of the biological datasets and high correlation of variables frequently yield multiple equally optimal signatures, thus making traditional feature selection methods unstable. Features selected based on one cohort of patients, may not work as well in another cohort. In addition, biologically important features may be missed due to selection of other co-clustered features We propose a new method, Tree-guided Recursive Cluster Selection (T-ReCS), for efficient selection of grouped features. T-ReCS significantly improves predictive stability while maintains the same level of accuracy. T-ReCS does not require an a priori knowledge of the clusters like group-lasso and also can handle "orphan" features (not belonging to a cluster). T-ReCS can be used with categorical or survival target variables. Tested on simulated and real expression data from breast cancer and lung diseases and survival data, T-ReCS selected stable cluster features without significant loss in classification accuracy.

摘要

特征选择在生物医学研究中被广泛用于生物标志物识别和患者分类,这两者都是制定个性化医疗策略的关键步骤。然而,生物数据集的结构化性质和变量的高相关性经常产生多个同样最优的特征集,从而使传统的特征选择方法不稳定。基于一组患者选择的特征,在另一组患者中可能效果不佳。此外,由于选择了其他共聚类特征,可能会遗漏生物学上重要的特征。我们提出了一种新的方法,树引导递归聚类选择(T-ReCS),用于高效选择分组特征。T-ReCS显著提高了预测稳定性,同时保持了相同的准确率水平。T-ReCS不像组套索那样需要聚类的先验知识,并且还可以处理“孤立”特征(不属于任何聚类的特征)。T-ReCS可用于分类或生存目标变量。在来自乳腺癌和肺部疾病的模拟和真实表达数据以及生存数据上进行测试,T-ReCS选择了稳定的聚类特征,且分类准确率没有显著损失。

相似文献

7

引用本文的文献

1
Extending greedy feature selection algorithms to multiple solutions.将贪婪特征选择算法扩展到多个解决方案。
Data Min Knowl Discov. 2021;35(4):1393-1434. doi: 10.1007/s10618-020-00731-7. Epub 2021 May 1.
2
A Pipeline for Integrated Theory and Data-Driven Modeling of Biomedical Data.生物医学数据的理论与数据驱动建模的集成流水线。
IEEE/ACM Trans Comput Biol Bioinform. 2021 May-Jun;18(3):811-822. doi: 10.1109/TCBB.2020.3019237. Epub 2021 Jun 3.

本文引用的文献

1
Biomarker signature identification in "omics" data with multi-class outcome.多类结局“组学”数据中的生物标志物特征识别。
Comput Struct Biotechnol J. 2013 Jun 8;6:e201303004. doi: 10.5936/csbj.201303004. eCollection 2013.
5
Profibrotic role of miR-154 in pulmonary fibrosis.miR-154 在肺纤维化中的促纤维化作用。
Am J Respir Cell Mol Biol. 2012 Dec;47(6):879-87. doi: 10.1165/rcmb.2011-0377OC. Epub 2012 Oct 4.
8
Gene expression networks in COPD: microRNA and mRNA regulation.COPD 中的基因表达网络:miRNA 和 mRNA 调控。
Thorax. 2012 Feb;67(2):122-31. doi: 10.1136/thoraxjnl-2011-200089. Epub 2011 Sep 22.
9
mirConnX: condition-specific mRNA-microRNA network integrator.mirConnX:基于条件的 mRNA- miRNA 网络综合分析工具。
Nucleic Acids Res. 2011 Jul;39(Web Server issue):W416-23. doi: 10.1093/nar/gkr276. Epub 2011 May 10.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验