Suppr超能文献

一种通过特征选择进行集合分类的统计方法及其在组织病理学图像分类中的应用

A statistical approach to set classification by feature selection with applications to classification of histopathology images.

作者信息

Jung Sungkyu, Qiao Xingye

机构信息

Department of Statistics, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, U.S.A.

Department of Mathematical Sciences, Binghamton University, State University of New York, Binghamton, New York 13902-6000, U.S.A.

出版信息

Biometrics. 2014 Sep;70(3):536-45. doi: 10.1111/biom.12164. Epub 2014 Mar 3.

Abstract

Set classification problems arise when classification tasks are based on sets of observations as opposed to individual observations. In set classification, a classification rule is trained with N sets of observations, where each set is labeled with class information, and the prediction of a class label is performed also with a set of observations. Data sets for set classification appear, for example, in diagnostics of disease based on multiple cell nucleus images from a single tissue. Relevant statistical models for set classification are introduced, which motivate a set classification framework based on context-free feature extraction. By understanding a set of observations as an empirical distribution, we employ a data-driven method to choose those features which contain information on location and major variation. In particular, the method of principal component analysis is used to extract the features of major variation. Multidimensional scaling is used to represent features as vector-valued points on which conventional classifiers can be applied. The proposed set classification approaches achieve better classification results than competing methods in a number of simulated data examples. The benefits of our method are demonstrated in an analysis of histopathology images of cell nuclei related to liver cancer.

摘要

当分类任务基于观测集而非单个观测值时,就会出现集合分类问题。在集合分类中,使用N组观测值训练分类规则,其中每组观测值都带有类别信息,并且类别标签的预测也是基于一组观测值进行的。例如,基于来自单个组织的多个细胞核图像进行疾病诊断时,就会出现用于集合分类的数据集。引入了用于集合分类的相关统计模型,这些模型推动了基于上下文无关特征提取的集合分类框架。通过将一组观测值理解为经验分布,我们采用数据驱动的方法来选择那些包含位置和主要变化信息的特征。特别地,使用主成分分析方法来提取主要变化的特征。使用多维缩放将特征表示为向量值点,以便可以应用传统分类器。在一些模拟数据示例中,所提出的集合分类方法比竞争方法取得了更好的分类结果。我们的方法的优势在对与肝癌相关的细胞核组织病理学图像的分析中得到了证明。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验