Suppr超能文献

将预测变量的先验知识纳入具有多个惩罚项的惩罚分类器中。

Incorporating prior knowledge of predictors into penalized classifiers with multiple penalty terms.

作者信息

Tai Feng, Pan Wei

机构信息

Division of Biostatistics, School of Public Health, University of Minnesota, A460 Mayo Building (MMC 303), Minneapolis, MN 55455-0378, USA.

出版信息

Bioinformatics. 2007 Jul 15;23(14):1775-82. doi: 10.1093/bioinformatics/btm234. Epub 2007 May 5.

Abstract

MOTIVATION

In the context of sample (e.g. tumor) classifications with microarray gene expression data, many methods have been proposed. However, almost all the methods ignore existing biological knowledge and treat all the genes equally a priori. On the other hand, because some genes have been identified by previous studies to have biological functions or to be involved in pathways related to the outcome (e.g. cancer), incorporating this type of prior knowledge into a classifier can potentially improve both the predictive performance and interpretability of the resulting model.

RESULTS

We propose a simple and general framework to incorporate such prior knowledge into building a penalized classifier. As two concrete examples, we apply the idea to two penalized classifiers, nearest shrunken centroids (also called PAM) and penalized partial least squares (PPLS). Instead of treating all the genes equally a priori as in standard penalized methods, we group the genes according to their functional associations based on existing biological knowledge or data, and adopt group-specific penalty terms and penalization parameters. Simulated and real data examples demonstrate that, if prior knowledge on gene grouping is indeed informative, our new methods perform better than the two standard penalized methods, yielding higher predictive accuracy and screening out more irrelevant genes.

摘要

动机

在利用微阵列基因表达数据进行样本(如肿瘤)分类的背景下,已经提出了许多方法。然而,几乎所有方法都忽略了现有的生物学知识,并且在一开始就平等对待所有基因。另一方面,由于先前的研究已经确定一些基因具有生物学功能或参与与结果(如癌症)相关的通路,将这类先验知识纳入分类器可能会潜在地提高所得模型的预测性能和可解释性。

结果

我们提出了一个简单通用的框架,将此类先验知识纳入构建惩罚分类器中。作为两个具体示例,我们将该想法应用于两个惩罚分类器,最近收缩质心(也称为PAM)和惩罚偏最小二乘法(PPLS)。与标准惩罚方法一开始平等对待所有基因不同,我们基于现有生物学知识或数据根据基因的功能关联对基因进行分组,并采用组特异性惩罚项和惩罚参数。模拟和实际数据示例表明,如果关于基因分组的先验知识确实具有信息性,我们的新方法比两种标准惩罚方法表现更好,具有更高的预测准确性并筛选出更多无关基因。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验