• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

Neyman-Pearson 分类算法和 NP 接收机工作特性。

Neyman-Pearson classification algorithms and NP receiver operating characteristics.

机构信息

Department of Data Sciences and Operations, Marshall School of Business, University of Southern California, Los Angeles, CA 90089, USA.

Department of Statistics, Columbia University, New York, NY 10027-5927, USA.

出版信息

Sci Adv. 2018 Feb 2;4(2):eaao1659. doi: 10.1126/sciadv.aao1659. eCollection 2018 Feb.

DOI:10.1126/sciadv.aao1659
PMID:29423442
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5804623/
Abstract

In many binary classification applications, such as disease diagnosis and spam detection, practitioners commonly face the need to limit type I error (that is, the conditional probability of misclassifying a class 0 observation as class 1) so that it remains below a desired threshold. To address this need, the Neyman-Pearson (NP) classification paradigm is a natural choice; it minimizes type II error (that is, the conditional probability of misclassifying a class 1 observation as class 0) while enforcing an upper bound, α, on the type I error. Despite its century-long history in hypothesis testing, the NP paradigm has not been well recognized and implemented in classification schemes. Common practices that directly limit the empirical type I error to no more than α do not satisfy the type I error control objective because the resulting classifiers are likely to have type I errors much larger than α, and the NP paradigm has not been properly implemented in practice. We develop the first umbrella algorithm that implements the NP paradigm for all scoring-type classification methods, such as logistic regression, support vector machines, and random forests. Powered by this algorithm, we propose a novel graphical tool for NP classification methods: NP receiver operating characteristic (NP-ROC) bands motivated by the popular ROC curves. NP-ROC bands will help choose α in a data-adaptive way and compare different NP classifiers. We demonstrate the use and properties of the NP umbrella algorithm and NP-ROC bands, available in the R package nproc, through simulation and real data studies.

摘要

在许多二分类应用中,如疾病诊断和垃圾邮件检测,从业者通常需要将第一类错误(即错误地将 0 类观测值分类为 1 类的条件概率)限制在一个期望的阈值以下。为了满足这一需求,Neyman-Pearson(NP)分类范式是一个自然的选择;它在强制限制第一类错误(即错误地将 1 类观测值分类为 0 类的条件概率)的同时,最小化第二类错误(即错误地将 1 类观测值分类为 0 类的条件概率)。尽管在假设检验方面已经有了一个世纪的历史,但 NP 范式在分类方案中并没有得到很好的认可和实施。直接将经验第一类错误限制在不超过α的常见做法并不能满足第一类错误控制目标,因为由此产生的分类器很可能具有远大于α的第一类错误,并且 NP 范式在实践中没有得到正确实施。我们开发了第一个用于所有评分型分类方法的 NP 范式的伞式算法,例如逻辑回归、支持向量机和随机森林。基于这个算法,我们提出了一个新的 NP 分类方法的图形工具:基于流行的 ROC 曲线的 NP 接收器操作特性(NP-ROC)带。NP-ROC 带将帮助以数据自适应的方式选择α,并比较不同的 NP 分类器。我们通过模拟和真实数据研究展示了 NP 伞式算法和 NP-ROC 带的用途和特性,这些工具可在 R 包 nproc 中使用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5aa6/5804623/54881cbf0231/aao1659-F6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5aa6/5804623/63d0542b8035/aao1659-F1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5aa6/5804623/786f1b021cdf/aao1659-F2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5aa6/5804623/dc60f7739599/aao1659-F3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5aa6/5804623/867dca6fab60/aao1659-F4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5aa6/5804623/8df23367eef4/aao1659-F5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5aa6/5804623/54881cbf0231/aao1659-F6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5aa6/5804623/63d0542b8035/aao1659-F1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5aa6/5804623/786f1b021cdf/aao1659-F2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5aa6/5804623/dc60f7739599/aao1659-F3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5aa6/5804623/867dca6fab60/aao1659-F4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5aa6/5804623/8df23367eef4/aao1659-F5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5aa6/5804623/54881cbf0231/aao1659-F6.jpg

相似文献

1
Neyman-Pearson classification algorithms and NP receiver operating characteristics.Neyman-Pearson 分类算法和 NP 接收机工作特性。
Sci Adv. 2018 Feb 2;4(2):eaao1659. doi: 10.1126/sciadv.aao1659. eCollection 2018 Feb.
2
ROC-based utility function maximization for feature selection and classification with applications to high-dimensional protease data.基于ROC的效用函数最大化用于特征选择和分类及其在高维蛋白酶数据中的应用
Biometrics. 2008 Dec;64(4):1155-61. doi: 10.1111/j.1541-0420.2008.01015.x. Epub 2008 Mar 24.
3
New nonleast-squares neural network learning algorithms for hypothesis testing.
IEEE Trans Neural Netw. 1995;6(3):596-609. doi: 10.1109/72.377966.
4
Receiver operating characteristic curves and confidence bands for support vector machines.支持向量机的接收者操作特征曲线和置信带。
Biometrics. 2021 Dec;77(4):1422-1430. doi: 10.1111/biom.13365. Epub 2020 Sep 12.
5
Optimization of restricted ROC surfaces in three-class classification tasks.三类分类任务中受限ROC曲面的优化
IEEE Trans Med Imaging. 2007 Oct;26(10):1345-56. doi: 10.1109/TMI.2007.898578.
6
[Research on operating characteristics of multiclass receiver in machine learning].[机器学习中多类接收器的运行特性研究]
Sheng Wu Yi Xue Gong Cheng Xue Za Zhi. 2012 Feb;29(1):170-4.
7
An extension of the receiver operating characteristic curve and AUC-optimal classification.ROC 曲线的扩展和 AUC 最优分类。
Neural Comput. 2012 Oct;24(10):2789-824. doi: 10.1162/NECO_a_00336. Epub 2012 Jun 26.
8
Efficiency of different measures for defining the applicability domain of classification models.用于定义分类模型适用范围的不同方法的效率
J Cheminform. 2017 Aug 3;9(1):44. doi: 10.1186/s13321-017-0230-2.
9
Modified Mahalanobis Taguchi System for Imbalance Data Classification.用于不平衡数据分类的改进马氏田口系统
Comput Intell Neurosci. 2017;2017:5874896. doi: 10.1155/2017/5874896. Epub 2017 Jul 24.
10
Smooth non-parametric receiver operating characteristic (ROC) curves for continuous diagnostic tests.连续诊断试验的平滑非参数接收者操作特征(ROC)曲线。
Stat Med. 1997 Oct 15;16(19):2143-56. doi: 10.1002/(sici)1097-0258(19971015)16:19<2143::aid-sim655>3.0.co;2-3.

引用本文的文献

1
Neyman-Pearson Multi-class Classification via Cost-sensitive Learning.通过成本敏感学习实现的奈曼-皮尔逊多类分类
J Am Stat Assoc. 2025;120(550):1164-1177. doi: 10.1080/01621459.2024.2402567. Epub 2024 Nov 19.
2
POCALI: Prediction and Insight on CAncer LncRNAs by Integrating Multi-Omics Data with Machine Learning.POCALI:通过机器学习整合多组学数据对癌症长链非编码RNA进行预测与洞察
Small Methods. 2025 Jul;9(7):e2401987. doi: 10.1002/smtd.202401987. Epub 2025 May 23.
3
Federated Learning with Convex Global and Local Constraints.具有凸全局和局部约束的联邦学习

本文引用的文献

1
TROM: A Testing-Based Method for Finding Transcriptomic Similarity of Biological Samples.TROM:一种基于测试的生物样本转录组相似性查找方法。
Stat Biosci. 2017 Jun;9(1):105-136. doi: 10.1007/s12561-016-9163-y. Epub 2016 Aug 29.
2
An investigation of biomarkers derived from legacy microarray data for their utility in the RNA-seq era.对源自传统微阵列数据的生物标志物在RNA测序时代的效用进行的一项调查。
Genome Biol. 2014 Dec 3;15(12):523. doi: 10.1186/s13059-014-0523-y.
3
Assessing technical performance in differential gene expression experiments with external spike-in RNA control ratio mixtures.
Transact Mach Learn Res. 2024;2024. Epub 2024 May 3.
4
Genetic variants of glucose metabolism and exposure to smoking in African American breast cancer.葡萄糖代谢的遗传变异与非裔美国乳腺癌患者的吸烟暴露
Endocr Relat Cancer. 2023 Mar 10;30(4). doi: 10.1530/ERC-22-0184. Print 2023 Apr 1.
5
Uncovering Oncogenic Mechanisms of Tumor Suppressor Genes in Breast Cancer Multi-Omics Data.揭示乳腺癌多组学数据中肿瘤抑制基因的致癌机制。
Int J Mol Sci. 2022 Aug 25;23(17):9624. doi: 10.3390/ijms23179624.
6
A flexible model-free prediction-based framework for feature ranking.一种基于灵活的无模型预测的特征排序框架。
J Mach Learn Res. 2021 May;22.
7
Development of Gene Expression-Based Random Forest Model for Predicting Neoadjuvant Chemotherapy Response in Triple-Negative Breast Cancer.基于基因表达的随机森林模型用于预测三阴性乳腺癌新辅助化疗反应的研究进展
Cancers (Basel). 2022 Feb 10;14(4):881. doi: 10.3390/cancers14040881.
8
Introduction and Analysis of a Method for the Investigation of QCD-like Tree Data.类量子色动力学树数据研究方法的介绍与分析
Entropy (Basel). 2022 Jan 9;24(1):104. doi: 10.3390/e24010104.
9
Synergistic Effects of Genetic Variants of Glucose Homeostasis and Lifelong Exposures to Cigarette Smoking, Female Hormones, and Dietary Fat Intake on Primary Colorectal Cancer Development in African and Hispanic/Latino American Women.葡萄糖稳态基因变异与终生接触吸烟、女性激素及膳食脂肪摄入对非洲裔和西班牙裔/拉丁裔美国女性原发性结直肠癌发生的协同作用。
Front Oncol. 2021 Oct 7;11:760243. doi: 10.3389/fonc.2021.760243. eCollection 2021.
10
Genetic Signatures of Glucose Homeostasis: Synergistic Interplay With Long-Term Exposure to Cigarette Smoking in Development of Primary Colorectal Cancer Among African American Women.葡萄糖稳态的遗传特征:长期暴露于吸烟与非裔美国女性原发性结直肠癌发生的协同作用。
Clin Transl Gastroenterol. 2021 Oct 5;12(10):e00412. doi: 10.14309/ctg.0000000000000412.
使用外源 RNA 内参 Spike-in 控制混合物评估差异基因表达实验中的技术性能。
Nat Commun. 2014 Sep 25;5:5125. doi: 10.1038/ncomms6125.
4
Diagnosis: Early warning system.诊断:早期预警系统。
Nature. 2014 Sep 11;513(7517):S4-6. doi: 10.1038/513S4a.
5
The concordance between RNA-seq and microarray data depends on chemical treatment and transcript abundance.RNA测序与微阵列数据之间的一致性取决于化学处理和转录本丰度。
Nat Biotechnol. 2014 Sep;32(9):926-32. doi: 10.1038/nbt.3001. Epub 2014 Aug 24.
6
A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium.测序质量控制联盟对RNA测序准确性、可重复性和信息含量的全面评估。
Nat Biotechnol. 2014 Sep;32(9):903-14. doi: 10.1038/nbt.2957. Epub 2014 Aug 24.
7
Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond.评估新标志物的附加预测能力:从ROC曲线下面积到重新分类及其他。
Stat Med. 2008 Jan 30;27(2):157-72; discussion 207-12. doi: 10.1002/sim.2929.
8
Index for rating diagnostic tests.诊断试验评级指数。
Cancer. 1950 Jan;3(1):32-5. doi: 10.1002/1097-0142(1950)3:1<32::aid-cncr2820030106>3.0.co;2-3.
9
Radial basis function neural networks for nonlinear Fisher discrimination and Neyman-Pearson classification.用于非线性Fisher判别和奈曼-皮尔逊分类的径向基函数神经网络
Neural Netw. 2003 Jun-Jul;16(5-6):529-35. doi: 10.1016/S0893-6080(03)00086-8.
10
Boosting for tumor classification with gene expression data.利用基因表达数据进行肿瘤分类的提升算法
Bioinformatics. 2003 Jun 12;19(9):1061-9. doi: 10.1093/bioinformatics/btf867.