Suppr超能文献

一种用于生物医学文本分类的通用半监督和主动学习框架。

A Generic Semi-Supervised and Active Learning Framework for Biomedical Text Classification.

出版信息

Annu Int Conf IEEE Eng Med Biol Soc. 2022 Jul;2022:4445-4448. doi: 10.1109/EMBC48229.2022.9871846.

Abstract

Biomedical text classification requires having training examples labeled by clinical specialists, a process that can be costly. To address this problem, active learning incrementally selects a subset of the most informative unlabeled examples, samples that are then labeled and used to train a given classifier, seeking to reduce the number of labeled samples. Nonetheless, the other unlabeled examples are not used by active learning, but incorporating semi-supervised techniques that use unlabeled samples could improve the representativeness of the data and the discriminatory power of the classifiers. This work proposes a generic semi-supervised learning framework for improving active learning and reducing the number of labeled training examples in biomedical text classification. The proposed framework combines manually annotated training examples selected by active learning and pseudo-labels obtained from a trained classifier. To evaluate the proposed framework, three biomedical datasets with textual information on obesity and smoking habit were used across different classification algorithms. The classification results show that the proposed framework can reduce the number of training examples that are manually labeled by clinical specialists by a 10% without affecting the performance of the classifiers. This performance is attributable to the ability of the classifiers to correctly select and label the training examples. Clinical relevance- We demonstrate the effectiveness of the proposed semi-supervised learning framework to reduce manual labeling efforts of biomedical texts by clinical specialists for the training of classifiers.

摘要

生物医学文本分类需要具有由临床专家标记的训练示例,这是一个昂贵的过程。为了解决这个问题,主动学习逐步选择信息量最大的未标记示例子集,然后对这些样本进行标记并用于训练给定的分类器,以减少标记样本的数量。尽管如此,主动学习并未使用其他未标记的示例,但结合使用未标记示例的半监督技术可以提高数据的代表性和分类器的判别能力。这项工作提出了一种通用的半监督学习框架,用于改进主动学习并减少生物医学文本分类中标记训练示例的数量。所提出的框架结合了通过主动学习选择的手动注释训练示例和从经过训练的分类器获得的伪标签。为了评估所提出的框架,使用了三个具有肥胖和吸烟习惯文本信息的生物医学数据集,涉及不同的分类算法。分类结果表明,所提出的框架可以在不影响分类器性能的情况下,将临床专家手动标记的训练示例数量减少 10%。这种性能归因于分类器正确选择和标记训练示例的能力。临床相关性-我们证明了所提出的半监督学习框架在减少临床专家对分类器训练的生物医学文本的手动标记工作方面的有效性。

相似文献

6
SemiBoost: boosting for semi-supervised learning.半增强算法:用于半监督学习的增强算法
IEEE Trans Pattern Anal Mach Intell. 2009 Nov;31(11):2000-14. doi: 10.1109/TPAMI.2008.235.
9
Active semi-supervised learning for biological data classification.生物数据分类的主动半监督学习。
PLoS One. 2020 Aug 19;15(8):e0237428. doi: 10.1371/journal.pone.0237428. eCollection 2020.
10
Weakly Semi-supervised phenotyping using Electronic Health records.基于电子健康记录的弱监督表型研究
J Biomed Inform. 2022 Oct;134:104175. doi: 10.1016/j.jbi.2022.104175. Epub 2022 Sep 5.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验