基于少量示例的半监督增量学习发现医学关联规则。

Semi-supervised incremental learning with few examples for discovering medical association rules.

机构信息

Telemedicine and e-Health Research Unit, Monforte de Lemos 5, Instituto de Salud Carlos III, 28029, Madrid, Spain.

Instituto Mixto UNED-ISCIII, IMIENS, 28029, Madrid, Spain.

出版信息

BMC Med Inform Decis Mak. 2022 Jan 24;22(1):20. doi: 10.1186/s12911-022-01755-3.

DOI:10.1186/s12911-022-01755-3

PMID:35073885

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8785547/

Abstract

BACKGROUND

Association Rules are one of the main ways to represent structural patterns underlying raw data. They represent dependencies between sets of observations contained in the data. The associations established by these rules are very useful in the medical domain, for example in the predictive health field. Classic algorithms for association rule mining give rise to huge amounts of possible rules that should be filtered in order to select those most likely to be true. Most of the proposed techniques for these tasks are unsupervised. However, the accuracy provided by unsupervised systems is limited. Conversely, resorting to annotated data for training supervised systems is expensive and time-consuming. The purpose of this research is to design a new semi-supervised algorithm that performs like supervised algorithms but uses an affordable amount of training data.

METHODS

In this work we propose a new semi-supervised data mining model that combines unsupervised techniques (Fisher's exact test) with limited supervision. Starting with a small seed of annotated data, the model improves results (F-measure) obtained, using a fully supervised system (standard supervised ML algorithms). The idea is based on utilising the agreement between the predictions of the supervised system and those of the unsupervised techniques in a series of iterative steps.

RESULTS

The new semi-supervised ML algorithm improves the results of supervised algorithms computed using the F-measure in the task of mining medical association rules, but training with an affordable amount of manually annotated data.

CONCLUSIONS

Using a small amount of annotated data (which is easily achievable) leads to results similar to those of a supervised system. The proposal may be an important step for the practical development of techniques for mining association rules and generating new valuable scientific medical knowledge.

摘要

背景

关联规则是表示原始数据底层结构模式的主要方法之一。它们表示数据中包含的观测集之间的依赖关系。这些规则所建立的关联在医学领域非常有用，例如在预测健康领域。关联规则挖掘的经典算法会产生大量可能的规则，这些规则需要进行过滤，以选择那些最有可能成立的规则。大多数用于这些任务的提议技术都是无监督的。然而，无监督系统提供的准确性是有限的。相反，为有监督系统使用注释数据进行训练既昂贵又耗时。本研究的目的是设计一种新的半监督算法，该算法的表现类似于有监督算法，但使用可承受数量的训练数据。

方法

在这项工作中，我们提出了一种新的半监督数据挖掘模型，该模型将无监督技术（Fisher 精确检验）与有限监督相结合。从一小部分注释数据开始，该模型使用完全有监督的系统（标准监督机器学习算法）来改进获得的结果（F 度量）。该想法基于利用有监督系统的预测与无监督技术的预测之间的一致性，在一系列迭代步骤中进行。

结果

新的半监督机器学习算法通过使用 F 度量来改进使用监督算法计算的结果，从而挖掘医学关联规则，同时使用可承受的少量手动注释数据进行训练。

结论

使用少量注释数据（这是很容易实现的）可以得到类似于监督系统的结果。该提案可能是挖掘关联规则和生成新的有价值的医学科学知识的实用技术发展的重要步骤。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2af9/8785547/81370130e688/12911_2022_1755_Fig1_HTML.jpg

相似文献

Semi-supervised incremental learning with few examples for discovering medical association rules.

BMC Med Inform Decis Mak. 2022 Jan 24;22(1):20. doi: 10.1186/s12911-022-01755-3.

Combining unsupervised, supervised and rule-based learning: the case of detecting patient allergies in electronic health records.

BMC Med Inform Decis Mak. 2023 Sep 18;23(1):188. doi: 10.1186/s12911-023-02271-8.

Unsupervised and self-supervised deep learning approaches for biomedical text mining.

Brief Bioinform. 2021 Mar 22;22(2):1592-1603. doi: 10.1093/bib/bbab016.

An unsupervised text mining method for relation extraction from biomedical literature.

PLoS One. 2014 Jul 18;9(7):e102039. doi: 10.1371/journal.pone.0102039. eCollection 2014.

Incremental learning algorithm for large-scale semi-supervised ordinal regression.

Neural Netw. 2022 May;149:124-136. doi: 10.1016/j.neunet.2022.02.004. Epub 2022 Feb 11.

Iterative processes: a review of semi-supervised machine learning in rehabilitation science.

Disabil Rehabil Assist Technol. 2020 Jul;15(5):515-520. doi: 10.1080/17483107.2019.1604831. Epub 2019 Jul 8.

Identifying diseases symptoms and general rules using supervised and unsupervised machine learning.

Sci Rep. 2024 Aug 2;14(1):17956. doi: 10.1038/s41598-024-69029-8.

A semi-supervised classification RBM with an improved fMRI representation algorithm.

Comput Methods Programs Biomed. 2022 Jul;222:106960. doi: 10.1016/j.cmpb.2022.106960. Epub 2022 Jun 17.

Extracting microRNA-gene relations from biomedical literature using distant supervision.

PLoS One. 2017 Mar 6;12(3):e0171929. doi: 10.1371/journal.pone.0171929. eCollection 2017.

Small Data Challenges in Big Data Era: A Survey of Recent Progress on Unsupervised and Semi-Supervised Methods.

IEEE Trans Pattern Anal Mach Intell. 2022 Apr;44(4):2168-2187. doi: 10.1109/TPAMI.2020.3031898. Epub 2022 Mar 4.

引用本文的文献

Patient-Generated Collections for Organizing Electronic Health Record Data to Elevate Personal Meaning, Improve Actionability, and Support Patient-Health Care Provider Communication: Think-Aloud Evaluation Study.

JMIR Hum Factors. 2025 Feb 3;12:e50331. doi: 10.2196/50331.

Discovering HIV related information by means of association rules and machine learning.

Sci Rep. 2022 Oct 28;12(1):18208. doi: 10.1038/s41598-022-22695-y.

本文引用的文献

Association rule mining based study for identification of clinical parameters akin to occurrence of brain tumor.

Bioinformation. 2013 Jun 29;9(11):555-9. doi: 10.6026/97320630009555. Print 2013.

Efficient mining of association rules for the early diagnosis of Alzheimer's disease.

Phys Med Biol. 2011 Sep 21;56(18):6047-63. doi: 10.1088/0031-9155/56/18/017. Epub 2011 Aug 26.

An automated technique for identifying associations between medications, laboratory results and problems.

J Biomed Inform. 2010 Dec;43(6):891-901. doi: 10.1016/j.jbi.2010.09.009. Epub 2010 Sep 25.

A technique for identifying three diagnostic findings using association analysis.

Med Biol Eng Comput. 2007 Jan;45(1):51-9. doi: 10.1007/s11517-006-0121-6. Epub 2006 Dec 15.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于少量示例的半监督增量学习发现医学关联规则。

Semi-supervised incremental learning with few examples for discovering medical association rules.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献