• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

非随机选择的正无标记学习(PULSNAR):无需完全随机选择假设的类比例估计。

Positive Unlabeled Learning Selected Not At Random (PULSNAR): class proportion estimation without the selected completely at random assumption.

作者信息

Kumar Praveen, Lambert Christophe G

机构信息

Department of Internal Medicine, Division of Translational Informatics, University of New Mexico, Albuquerque, United States.

出版信息

PeerJ Comput Sci. 2024 Nov 5;10:e2451. doi: 10.7717/peerj-cs.2451. eCollection 2024.

DOI:10.7717/peerj-cs.2451
PMID:39650456
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11622864/
Abstract

Positive and unlabeled (PU) learning is a type of semi-supervised binary classification where the machine learning algorithm differentiates between a set of positive instances (labeled) and a set of both positive and negative instances (unlabeled). PU learning has broad applications in settings where confirmed negatives are unavailable or difficult to obtain, and there is value in discovering positives among the unlabeled (., viable drugs among untested compounds). Most PU learning algorithms make the selected completely at random (SCAR) assumption, namely that positives are selected independently of their features. However, in many real-world applications, such as healthcare, positives are not SCAR (., severe cases are more likely to be diagnosed), leading to a poor estimate of the proportion, α, of positives among unlabeled examples and poor model calibration, resulting in an uncertain decision threshold for selecting positives. PU learning algorithms vary; some estimate only the proportion, α, of positives in the unlabeled set, while others calculate the probability that each specific unlabeled instance is positive, and some can do both. We propose two PU learning algorithms to estimate α, calculate calibrated probabilities for PU instances, and improve classification metrics: i) PULSCAR (positive unlabeled learning selected completely at random), and ii) PULSNAR (positive unlabeled learning selected not at random). PULSNAR employs a divide-and-conquer approach to cluster SNAR positives into subtypes and estimates α for each subtype by applying PULSCAR to positives from each cluster and all unlabeled. In our experiments, PULSNAR outperformed state-of-the-art approaches on both synthetic and real-world benchmark datasets.

摘要

正例与无标签(PU)学习是一种半监督二分类方法,其中机器学习算法要区分一组正例(有标签)和一组正负例混合的实例(无标签)。PU学习在难以获取或无法获取已确认负例的场景中有广泛应用,并且在无标签数据中发现正例(如在未测试化合物中发现有效药物)具有重要价值。大多数PU学习算法采用完全随机选择(SCAR)假设,即正例是独立于其特征被选择的。然而,在许多实际应用中,如医疗保健领域,正例并非完全随机选择(例如,严重病例更有可能被诊断出来),这导致对无标签示例中正例比例α的估计不准确,模型校准效果不佳,从而导致选择正例的决策阈值不确定。PU学习算法各不相同;有些只估计无标签集中正例的比例α,有些则计算每个特定无标签实例为正例的概率,还有些两者都能做到。我们提出了两种PU学习算法来估计α,计算PU实例的校准概率,并改善分类指标:i)PULSCAR(完全随机选择的正例与无标签学习)和ii)PULSNAR(非随机选择的正例与无标签学习)。PULSNAR采用分治法将SNAR正例聚类为子类型,并通过将PULSCAR应用于每个聚类中的正例和所有无标签数据来估计每个子类型的α。在我们的实验中,PULSNAR在合成数据集和真实世界基准数据集上均优于现有方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/45f4/11622864/b4c0bd5932c3/peerj-cs-10-2451-g016.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/45f4/11622864/85ed5719fdd0/peerj-cs-10-2451-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/45f4/11622864/09ccf44ff569/peerj-cs-10-2451-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/45f4/11622864/cbb4b63417b8/peerj-cs-10-2451-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/45f4/11622864/97b880a840ed/peerj-cs-10-2451-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/45f4/11622864/eda72256465f/peerj-cs-10-2451-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/45f4/11622864/a28749932801/peerj-cs-10-2451-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/45f4/11622864/c62afadc2099/peerj-cs-10-2451-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/45f4/11622864/a7b9f9d1e0dd/peerj-cs-10-2451-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/45f4/11622864/757e00f73e4a/peerj-cs-10-2451-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/45f4/11622864/864ac9e6f7e8/peerj-cs-10-2451-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/45f4/11622864/76f696a8e62e/peerj-cs-10-2451-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/45f4/11622864/d669b05f8a96/peerj-cs-10-2451-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/45f4/11622864/2eb51d936869/peerj-cs-10-2451-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/45f4/11622864/ef1125c1a10a/peerj-cs-10-2451-g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/45f4/11622864/66ad81f093ce/peerj-cs-10-2451-g015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/45f4/11622864/b4c0bd5932c3/peerj-cs-10-2451-g016.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/45f4/11622864/85ed5719fdd0/peerj-cs-10-2451-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/45f4/11622864/09ccf44ff569/peerj-cs-10-2451-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/45f4/11622864/cbb4b63417b8/peerj-cs-10-2451-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/45f4/11622864/97b880a840ed/peerj-cs-10-2451-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/45f4/11622864/eda72256465f/peerj-cs-10-2451-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/45f4/11622864/a28749932801/peerj-cs-10-2451-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/45f4/11622864/c62afadc2099/peerj-cs-10-2451-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/45f4/11622864/a7b9f9d1e0dd/peerj-cs-10-2451-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/45f4/11622864/757e00f73e4a/peerj-cs-10-2451-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/45f4/11622864/864ac9e6f7e8/peerj-cs-10-2451-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/45f4/11622864/76f696a8e62e/peerj-cs-10-2451-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/45f4/11622864/d669b05f8a96/peerj-cs-10-2451-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/45f4/11622864/2eb51d936869/peerj-cs-10-2451-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/45f4/11622864/ef1125c1a10a/peerj-cs-10-2451-g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/45f4/11622864/66ad81f093ce/peerj-cs-10-2451-g015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/45f4/11622864/b4c0bd5932c3/peerj-cs-10-2451-g016.jpg

相似文献

1
Positive Unlabeled Learning Selected Not At Random (PULSNAR): class proportion estimation without the selected completely at random assumption.非随机选择的正无标记学习(PULSNAR):无需完全随机选择假设的类比例估计。
PeerJ Comput Sci. 2024 Nov 5;10:e2451. doi: 10.7717/peerj-cs.2451. eCollection 2024.
2
KG2ML: Integrating Knowledge Graphs and Positive Unlabeled Learning for Identifying Disease-Associated Genes.KG2ML:整合知识图谱与正例无标注学习以识别疾病相关基因
medRxiv. 2025 Mar 17:2025.03.17.25323906. doi: 10.1101/2025.03.17.25323906.
3
Leveraging permutation testing to assess confidence in positive-unlabeled learning applied to high-dimensional biological datasets.利用排列检验评估正无标签学习在高维生物学数据集上的置信度。
BMC Bioinformatics. 2024 Jun 19;25(1):218. doi: 10.1186/s12859-024-05834-2.
4
AdaSampling for Positive-Unlabeled and Label Noise Learning With Bioinformatics Applications.AdaSampling 用于生物信息学中带正例无负例和带标签噪声学习
IEEE Trans Cybern. 2019 May;49(5):1932-1943. doi: 10.1109/TCYB.2018.2816984. Epub 2018 Apr 2.
5
Detecting Opioid Use Disorder in Health Claims Data With Positive Unlabeled Learning.利用正无标记学习在健康保险理赔数据中检测阿片类药物使用障碍
IEEE J Biomed Health Inform. 2025 Feb;29(2):750-757. doi: 10.1109/JBHI.2024.3515805. Epub 2025 Feb 10.
6
Positive-unlabeled learning in bioinformatics and computational biology: a brief review.生物信息学和计算生物学中的正无标记学习:简要综述。
Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab461.
7
Information-Theoretic Representation Learning for Positive-Unlabeled Classification.基于信息论的正例无标签分类表示学习。
Neural Comput. 2021 Jan;33(1):244-268. doi: 10.1162/neco_a_01337. Epub 2020 Oct 20.
8
A network-based positive and unlabeled learning approach for fake news detection.一种基于网络的用于虚假新闻检测的正例与无标签学习方法。
Mach Learn. 2022;111(10):3549-3592. doi: 10.1007/s10994-021-06111-6. Epub 2021 Nov 18.
9
A recent survey on instance-dependent positive and unlabeled learning.一项关于实例依赖型正例和无标签学习的近期调查。
Fundam Res. 2022 Oct 12;5(2):796-803. doi: 10.1016/j.fmre.2022.09.019. eCollection 2025 Mar.
10
Loss Decomposition and Centroid Estimation for Positive and Unlabeled Learning.用于正例和无标签学习的损失分解与质心估计
IEEE Trans Pattern Anal Mach Intell. 2021 Mar;43(3):918-932. doi: 10.1109/TPAMI.2019.2941684. Epub 2021 Feb 4.

引用本文的文献

1
Semi-supervised detection of natural selection with positive-unlabeled learning.基于正例未标注学习的自然选择半监督检测
bioRxiv. 2025 Aug 18:2025.08.15.670602. doi: 10.1101/2025.08.15.670602.
2
KG2ML: Integrating Knowledge Graphs and Positive Unlabeled Learning for Identifying Disease-Associated Genes.KG2ML:整合知识图谱与正例无标注学习以识别疾病相关基因
medRxiv. 2025 Mar 17:2025.03.17.25323906. doi: 10.1101/2025.03.17.25323906.
3
Detecting Opioid Use Disorder in Health Claims Data With Positive Unlabeled Learning.利用正无标记学习在健康保险理赔数据中检测阿片类药物使用障碍

本文引用的文献

1
Instance-Dependent Positive and Unlabeled Learning with Labeling Bias Estimation.基于标注偏差估计的实例相关正例与未标注样本学习
IEEE Trans Pattern Anal Mach Intell. 2021 Feb 23;PP. doi: 10.1109/TPAMI.2021.3061456.
2
Particle Size Distributions from Electron Microscopy Images: Avoiding Pitfalls.电子显微镜图像中的粒度分布:避免陷阱
J Phys Chem A. 2020 Dec 3;124(48):10075-10081. doi: 10.1021/acs.jpca.0c07840. Epub 2020 Nov 17.
3
SciPy 1.0: fundamental algorithms for scientific computing in Python.SciPy 1.0:Python 中的科学计算基础算法。
IEEE J Biomed Health Inform. 2025 Feb;29(2):750-757. doi: 10.1109/JBHI.2024.3515805. Epub 2025 Feb 10.
Nat Methods. 2020 Mar;17(3):261-272. doi: 10.1038/s41592-019-0686-2. Epub 2020 Feb 3.
4
Model selection and psychological theory: a discussion of the differences between the Akaike information criterion (AIC) and the Bayesian information criterion (BIC).模型选择和心理学理论:讨论赤池信息量准则(AIC)和贝叶斯信息量准则(BIC)之间的差异。
Psychol Methods. 2012 Jun;17(2):228-43. doi: 10.1037/a0027127. Epub 2012 Feb 6.
5
PSoL: a positive sample only learning algorithm for finding non-coding RNA genes.PSoL:一种用于寻找非编码RNA基因的仅正样本学习算法。
Bioinformatics. 2006 Nov 1;22(21):2590-6. doi: 10.1093/bioinformatics/btl441. Epub 2006 Aug 31.