Suppr超能文献

小样本分类有助于获得科学知识。

Scientific knowledge is possible with small-sample classification.

作者信息

Dougherty Edward R, Dalton Lori A

机构信息

Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA.

出版信息

EURASIP J Bioinform Syst Biol. 2013 Aug 20;2013(1):10. doi: 10.1186/1687-4153-2013-10.

Abstract

: A typical small-sample biomarker classification paper discriminates between types of pathology based on, say, 30,000 genes and a small labeled sample of less than 100 points. Some classification rule is used to design the classifier from this data, but we are given no good reason or conditions under which this algorithm should perform well. An error estimation rule is used to estimate the classification error on the population using the same data, but once again we are given no good reason or conditions under which this error estimator should produce a good estimate, and thus we do not know how well the classifier should be expected to perform. In fact, virtually, in all such papers the error estimate is expected to be highly inaccurate. In short, we are given no justification for any claims.Given the ubiquity of vacuous small-sample classification papers in the literature, one could easily conclude that scientific knowledge is impossible in small-sample settings. It is not that thousands of papers overtly claim that scientific knowledge is impossible in regard to their content; rather, it is that they utilize methods that preclude scientific knowledge. In this paper, we argue to the contrary that scientific knowledge in small-sample classification is possible provided there is sufficient prior knowledge. A natural way to proceed, discussed herein, is via a paradigm for pattern recognition in which we incorporate prior knowledge in the whole classification procedure (classifier design and error estimation), optimize each step of the procedure given available information, and obtain theoretical measures of performance for both classifiers and error estimators, the latter being the critical epistemological issue. In sum, we can achieve scientific validation for a proposed small-sample classifier and its error estimate.

摘要

一篇典型的小样本生物标志物分类论文会基于比如说30000个基因和一个少于100个样本点的小标记样本,来区分不同类型的病理学特征。会使用某种分类规则从这些数据中设计分类器,但我们并未得到该算法应表现良好的充分理由或条件。会使用一个误差估计规则,利用相同的数据来估计总体上的分类误差,但同样,我们也未得到该误差估计器应给出良好估计的充分理由或条件,因此我们不知道该分类器预期的表现会有多好。实际上,几乎在所有这类论文中,误差估计都被认为是极不准确的。简而言之,我们没有任何理由来支持任何论断。鉴于文献中充斥着空洞无物的小样本分类论文,人们很容易得出结论,即在小样本情况下科学知识是不可能的。并非成千上万篇论文公然宣称就其内容而言科学知识是不可能的;而是它们所采用的方法排除了科学知识。在本文中,我们持相反观点,即如果有足够的先验知识,小样本分类中的科学知识是可能的。本文所讨论的一种自然的方法是通过一种模式识别范式,在整个分类过程(分类器设计和误差估计)中纳入先验知识,根据可用信息优化该过程的每一步,并获得分类器和误差估计器性能的理论度量,后者是关键的认识论问题。总之,我们可以对所提出的小样本分类器及其误差估计进行科学验证。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e1d/3765562/e1e5b3228887/1687-4153-2013-10-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验