Suppr超能文献

基于ROC的效用函数最大化用于特征选择和分类及其在高维蛋白酶数据中的应用

ROC-based utility function maximization for feature selection and classification with applications to high-dimensional protease data.

作者信息

Liu Zhenqiu, Tan Ming

机构信息

Division of Biostatistics, University of Maryland Greenebaum Cancer Center, Baltimore, Maryland 21201, USA.

出版信息

Biometrics. 2008 Dec;64(4):1155-61. doi: 10.1111/j.1541-0420.2008.01015.x. Epub 2008 Mar 24.

Abstract

In medical diagnosis, the diseased and nondiseased classes are usually unbalanced and one class may be more important than the other depending on the diagnosis purpose. Most standard classification methods, however, are designed to maximize the overall accuracy and cannot incorporate different costs to different classes explicitly. In this article, we propose a novel nonparametric method to directly maximize the weighted specificity and sensitivity of the receiver operating characteristic curve. Combining advances in machine learning, optimization theory, and statistics, the proposed method has excellent generalization property and assigns different error costs to different classes explicitly. We present experiments that compare the proposed algorithms with support vector machines and regularized logistic regression using data from a study on HIV-1 protease as well as six public available datasets. Our main conclusion is that the performance of proposed algorithm is significantly better in most cases than the other classifiers tested. Software package in MATLAB is available upon request.

摘要

在医学诊断中,患病和未患病类别通常是不平衡的,并且根据诊断目的,其中一类可能比另一类更重要。然而,大多数标准分类方法旨在最大化总体准确率,并且不能明确地对不同类别纳入不同成本。在本文中,我们提出了一种新颖的非参数方法,以直接最大化接收器操作特性曲线的加权特异性和敏感性。结合机器学习、优化理论和统计学方面的进展,所提出的方法具有出色的泛化性能,并明确地为不同类别分配不同的错误成本。我们展示了一些实验,这些实验使用来自一项关于HIV-1蛋白酶的研究以及六个公开可用数据集的数据,将所提出的算法与支持向量机和正则化逻辑回归进行比较。我们的主要结论是,在所测试的大多数情况下,所提出算法的性能明显优于其他分类器。如有需要,可提供MATLAB软件包。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验