Suppr超能文献

生物信息学中的过度乐观:一个例证。

Over-optimism in bioinformatics: an illustration.

机构信息

Department of Medical Informatics, Biometry and Epidemiology, University of Munich, Munich, Germany.

出版信息

Bioinformatics. 2010 Aug 15;26(16):1990-8. doi: 10.1093/bioinformatics/btq323. Epub 2010 Jun 26.

Abstract

MOTIVATION

In statistical bioinformatics research, different optimization mechanisms potentially lead to 'over-optimism' in published papers. So far, however, a systematic critical study concerning the various sources underlying this over-optimism is lacking.

RESULTS

We present an empirical study on over-optimism using high-dimensional classification as example. Specifically, we consider a 'promising' new classification algorithm, namely linear discriminant analysis incorporating prior knowledge on gene functional groups through an appropriate shrinkage of the within-group covariance matrix. While this approach yields poor results in terms of error rate, we quantitatively demonstrate that it can artificially seem superior to existing approaches if we 'fish for significance'. The investigated sources of over-optimism include the optimization of datasets, of settings, of competing methods and, most importantly, of the method's characteristics. We conclude that, if the improvement of a quantitative criterion such as the error rate is the main contribution of a paper, the superiority of new algorithms should always be demonstrated on independent validation data.

AVAILABILITY

The R codes and relevant data can be downloaded from http://www.ibe.med.uni-muenchen.de/organisation/mitarbeiter/020_professuren/boulesteix/overoptimism/, such that the study is completely reproducible.

摘要

动机

在统计生物信息学研究中,不同的优化机制可能导致已发表论文中的“过度乐观”。然而,到目前为止,对于这种过度乐观的各种潜在来源,还缺乏系统的批判性研究。

结果

我们以高维分类为例,进行了一项关于过度乐观的实证研究。具体来说,我们考虑了一种“有前途”的新分类算法,即线性判别分析,通过对组内协方差矩阵的适当收缩,纳入了基因功能组的先验知识。虽然这种方法在误差率方面的结果较差,但我们定量证明,如果我们“寻找显著性”,它可能会人为地看起来优于现有方法。过度乐观的来源包括数据集、设置、竞争方法的优化,最重要的是方法的特征。我们得出的结论是,如果像误差率这样的定量标准的改进是论文的主要贡献,那么新算法的优越性应该始终在独立验证数据上得到证明。

可利用性

R 代码和相关数据可以从 http://www.ibe.med.uni-muenchen.de/organisation/mitarbeiter/020_professuren/boulesteix/overoptimism/ 下载,以便完全重现研究。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验