Suppr超能文献

基于少量示例的半监督增量学习发现医学关联规则。

Semi-supervised incremental learning with few examples for discovering medical association rules.

机构信息

Telemedicine and e-Health Research Unit, Monforte de Lemos 5, Instituto de Salud Carlos III, 28029, Madrid, Spain.

Instituto Mixto UNED-ISCIII, IMIENS, 28029, Madrid, Spain.

出版信息

BMC Med Inform Decis Mak. 2022 Jan 24;22(1):20. doi: 10.1186/s12911-022-01755-3.

Abstract

BACKGROUND

Association Rules are one of the main ways to represent structural patterns underlying raw data. They represent dependencies between sets of observations contained in the data. The associations established by these rules are very useful in the medical domain, for example in the predictive health field. Classic algorithms for association rule mining give rise to huge amounts of possible rules that should be filtered in order to select those most likely to be true. Most of the proposed techniques for these tasks are unsupervised. However, the accuracy provided by unsupervised systems is limited. Conversely, resorting to annotated data for training supervised systems is expensive and time-consuming. The purpose of this research is to design a new semi-supervised algorithm that performs like supervised algorithms but uses an affordable amount of training data.

METHODS

In this work we propose a new semi-supervised data mining model that combines unsupervised techniques (Fisher's exact test) with limited supervision. Starting with a small seed of annotated data, the model improves results (F-measure) obtained, using a fully supervised system (standard supervised ML algorithms). The idea is based on utilising the agreement between the predictions of the supervised system and those of the unsupervised techniques in a series of iterative steps.

RESULTS

The new semi-supervised ML algorithm improves the results of supervised algorithms computed using the F-measure in the task of mining medical association rules, but training with an affordable amount of manually annotated data.

CONCLUSIONS

Using a small amount of annotated data (which is easily achievable) leads to results similar to those of a supervised system. The proposal may be an important step for the practical development of techniques for mining association rules and generating new valuable scientific medical knowledge.

摘要

背景

关联规则是表示原始数据底层结构模式的主要方法之一。它们表示数据中包含的观测集之间的依赖关系。这些规则所建立的关联在医学领域非常有用,例如在预测健康领域。关联规则挖掘的经典算法会产生大量可能的规则,这些规则需要进行过滤,以选择那些最有可能成立的规则。大多数用于这些任务的提议技术都是无监督的。然而,无监督系统提供的准确性是有限的。相反,为有监督系统使用注释数据进行训练既昂贵又耗时。本研究的目的是设计一种新的半监督算法,该算法的表现类似于有监督算法,但使用可承受数量的训练数据。

方法

在这项工作中,我们提出了一种新的半监督数据挖掘模型,该模型将无监督技术(Fisher 精确检验)与有限监督相结合。从一小部分注释数据开始,该模型使用完全有监督的系统(标准监督机器学习算法)来改进获得的结果(F 度量)。该想法基于利用有监督系统的预测与无监督技术的预测之间的一致性,在一系列迭代步骤中进行。

结果

新的半监督机器学习算法通过使用 F 度量来改进使用监督算法计算的结果,从而挖掘医学关联规则,同时使用可承受的少量手动注释数据进行训练。

结论

使用少量注释数据(这是很容易实现的)可以得到类似于监督系统的结果。该提案可能是挖掘关联规则和生成新的有价值的医学科学知识的实用技术发展的重要步骤。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2af9/8785547/81370130e688/12911_2022_1755_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验