• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

学习从不均衡数据中改进医疗决策,且无需先验成本。

Learning to improve medical decision making from imbalanced data without a priori cost.

作者信息

Wan Xiang, Liu Jiming, Cheung William K, Tong Tiejun

机构信息

Department of Computer Science and Institute of Computational and Theoretical Studies, Hong Kong Baptist University, Kowloon Tong, Hong Kong.

Department of Mathematics, Hong Kong Baptist University, Kowloon Tong, Hong Kong.

出版信息

BMC Med Inform Decis Mak. 2014 Dec 5;14:111. doi: 10.1186/s12911-014-0111-9.

DOI:10.1186/s12911-014-0111-9
PMID:25480146
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4261533/
Abstract

BACKGROUND

In a medical data set, data are commonly composed of a minority (positive or abnormal) group and a majority (negative or normal) group and the cost of misclassifying a minority sample as a majority sample is highly expensive. This is the so-called imbalanced classification problem. The traditional classification functions can be seriously affected by the skewed class distribution in the data. To deal with this problem, people often use a priori cost to adjust the learning process in the pursuit of optimal classification function. However, this priori cost is often unknown and hard to estimate in medical decision making.

METHODS

In this paper, we propose a new learning method, named RankCost, to classify imbalanced medical data without using a priori cost. Instead of focusing on improving the class-prediction accuracy, RankCost is to maximize the difference between the minority class and the majority class by using a scoring function, which translates the imbalanced classification problem into a partial ranking problem. The scoring function is learned via a non-parametric boosting algorithm.

RESULTS

We compare RankCost to several representative approaches on four medical data sets varying in size, imbalanced ratio, and dimension. The experimental results demonstrate that unlike the currently available methods that often perform unevenly with different priori costs, RankCost shows comparable performance in a consistent manner.

CONCLUSIONS

It is a challenging task to learn an effective classification model based on imbalanced data in medical data analysis. The traditional approaches often use a priori cost to adjust the learning of the classification function. This work presents a novel approach, namely RankCost, for learning from medical imbalanced data sets without using a priori cost. The experimental results indicate that RankCost performs very well in imbalanced data classification and can be a useful method in real-world applications of medical decision making.

摘要

背景

在医学数据集中,数据通常由少数(阳性或异常)组和多数(阴性或正常)组组成,将少数样本误分类为多数样本的代价非常高昂。这就是所谓的不平衡分类问题。传统的分类函数会受到数据中倾斜的类分布的严重影响。为了解决这个问题,人们通常使用先验代价来调整学习过程以追求最优分类函数。然而,在医学决策中,这种先验代价往往未知且难以估计。

方法

在本文中,我们提出了一种名为RankCost的新学习方法,用于在不使用先验代价的情况下对不平衡医学数据进行分类。RankCost不是专注于提高类预测准确率,而是通过使用评分函数来最大化少数类和多数类之间的差异,该评分函数将不平衡分类问题转化为部分排序问题。评分函数通过非参数提升算法进行学习。

结果

我们在四个大小、不平衡率和维度各异的医学数据集上,将RankCost与几种代表性方法进行了比较。实验结果表明,与目前可用的方法不同,后者在不同先验代价下表现往往参差不齐,而RankCost以一致的方式展现出可比的性能。

结论

在医学数据分析中,基于不平衡数据学习有效的分类模型是一项具有挑战性的任务。传统方法通常使用先验代价来调整分类函数的学习。这项工作提出了一种新颖的方法,即RankCost,用于从不使用先验代价的医学不平衡数据集中进行学习。实验结果表明,RankCost在不平衡数据分类中表现出色,并且可以成为医学决策实际应用中的一种有用方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e5ef/4261533/18e70b826407/12911_2014_111_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e5ef/4261533/0b02961601ff/12911_2014_111_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e5ef/4261533/fb4220c97c05/12911_2014_111_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e5ef/4261533/18e70b826407/12911_2014_111_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e5ef/4261533/0b02961601ff/12911_2014_111_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e5ef/4261533/fb4220c97c05/12911_2014_111_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e5ef/4261533/18e70b826407/12911_2014_111_Fig3_HTML.jpg

相似文献

1
Learning to improve medical decision making from imbalanced data without a priori cost.学习从不均衡数据中改进医疗决策,且无需先验成本。
BMC Med Inform Decis Mak. 2014 Dec 5;14:111. doi: 10.1186/s12911-014-0111-9.
2
A model driven approach to imbalanced data sampling in medical decision making.一种用于医疗决策中不平衡数据采样的模型驱动方法。
Stud Health Technol Inform. 2010;160(Pt 2):856-60.
3
Balanced gradient boosting from imbalanced data for clinical outcome prediction.用于临床结果预测的不平衡数据的平衡梯度提升法
Stat Appl Genet Mol Biol. 2009;8:Article20. doi: 10.2202/1544-6115.1422. Epub 2009 Apr 7.
4
Ensemble learning with active example selection for imbalanced biomedical data classification.基于主动示例选择的集成学习方法在生物医学数据不平衡分类中的应用。
IEEE/ACM Trans Comput Biol Bioinform. 2011 Mar-Apr;8(2):316-25. doi: 10.1109/TCBB.2010.96.
5
A hybrid cost-sensitive ensemble for imbalanced breast thermogram classification.一种用于不平衡乳腺热成像分类的混合成本敏感集成方法。
Artif Intell Med. 2015 Nov;65(3):219-27. doi: 10.1016/j.artmed.2015.07.005. Epub 2015 Jul 31.
6
Evolutionary undersampling for classification with imbalanced datasets: proposals and taxonomy.用于不平衡数据集分类的进化欠采样:提议与分类法
Evol Comput. 2009 Fall;17(3):275-306. doi: 10.1162/evco.2009.17.3.275.
7
Graph ensemble boosting for imbalanced noisy graph stream classification.基于图集成提升的不平衡噪声图流分类。
IEEE Trans Cybern. 2015 May;45(5):940-54. doi: 10.1109/TCYB.2014.2341031. Epub 2014 Aug 27.
8
A Cost-Sensitive Deep Belief Network for Imbalanced Classification.一种用于不平衡分类的成本敏感深度信念网络。
IEEE Trans Neural Netw Learn Syst. 2019 Jan;30(1):109-122. doi: 10.1109/TNNLS.2018.2832648. Epub 2018 May 28.
9
A learning method for the class imbalance problem with medical data sets.一种用于医学数据集的类别不平衡问题的学习方法。
Comput Biol Med. 2010 May;40(5):509-18. doi: 10.1016/j.compbiomed.2010.03.005. Epub 2010 Mar 26.
10
Embedding Undersampling Rotation Forest for Imbalanced Problem.基于欠采样旋转森林的不平衡问题嵌入。
Comput Intell Neurosci. 2018 Nov 1;2018:6798042. doi: 10.1155/2018/6798042. eCollection 2018.

引用本文的文献

1
The Digital Transformation of Healthcare Through Intelligent Technologies: A Path Dependence-Augmented-Unified Theory of Acceptance and Use of Technology Model for Clinical Decision Support Systems.通过智能技术实现医疗保健的数字化转型:临床决策支持系统的技术接受与使用的路径依赖增强统一理论模型
Healthcare (Basel). 2025 May 22;13(11):1222. doi: 10.3390/healthcare13111222.
2
A Method for Medical Data Analysis Using the LogNNet for Clinical Decision Support Systems and Edge Computing in Healthcare.基于 LogNNet 的医疗数据分析方法用于临床决策支持系统和医疗保健中的边缘计算。
Sensors (Basel). 2021 Sep 16;21(18):6209. doi: 10.3390/s21186209.
3
Computational advances of tumor marker selection and sample classification in cancer proteomics.
癌症蛋白质组学中肿瘤标志物选择与样本分类的计算进展
Comput Struct Biotechnol J. 2020 Jul 17;18:2012-2025. doi: 10.1016/j.csbj.2020.07.009. eCollection 2020.
4
Logistic Regression Confined by Cardinality-Constrained Sample and Feature Selection.基于约束样本和特征选择的逻辑回归。
IEEE Trans Pattern Anal Mach Intell. 2020 Jul;42(7):1713-1728. doi: 10.1109/TPAMI.2019.2901688. Epub 2019 Feb 26.