• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

减少对生物医学知识发现的监督。

Reduction of supervision for biomedical knowledge discovery.

作者信息

Theodoropoulos Christos, Coman Andrei Catalin, Henderson James, Moens Marie-Francine

机构信息

Computer Science Department, KU Leuven, Celestijnenlaan 200A, 3001, Leuven, Belgium.

Natural Language Understanding group, Idiap Research Institute, Rue Marconi 19, 1920, Martigny, Switzerland.

出版信息

BMC Bioinformatics. 2025 Sep 1;26(1):225. doi: 10.1186/s12859-025-06187-0.

DOI:10.1186/s12859-025-06187-0
PMID:40890616
Abstract

BACKGROUND

Knowledge discovery in scientific literature is hindered by the increasing volume of publications and the scarcity of extensive annotated data. To tackle the challenge of information overload, it is essential to employ automated methods for knowledge extraction and processing. Finding the right balance between the level of supervision and the effectiveness of models poses a significant challenge. While supervised techniques generally result in better performance, they have the major drawback of demanding labeled data. This requirement is labor-intensive, time-consuming, and hinders scalability when exploring new domains.

METHODS AND RESULTS

In this context, our study addresses the challenge of identifying semantic relationships between biomedical entities (e.g., diseases, proteins, medications) in unstructured text while minimizing dependency on supervision. We introduce a suite of unsupervised algorithms based on dependency trees and attention mechanisms and employ a range of pointwise binary classification methods. Transitioning from weakly supervised to fully unsupervised settings, we assess the methods' ability to learn from data with noisy labels. The evaluation on four biomedical benchmark datasets explores the effectiveness of the methods, demonstrating their potential to enable scalable knowledge discovery systems less reliant on annotated datasets.

CONCLUSION

Our approach tackles a central issue in knowledge discovery: balancing performance with minimal supervision which is crucial to adapting models to varied and changing domains. This study also investigates the use of pointwise binary classification techniques within a weakly supervised framework for knowledge discovery. By gradually decreasing supervision, we assess the robustness of these techniques in handling noisy labels, revealing their capability to shift from weakly supervised to entirely unsupervised scenarios. Comprehensive benchmarking offers insights into the effectiveness of these techniques, examining how unsupervised methods can reliably capture complex relationships in biomedical texts. These results suggest an encouraging direction toward scalable, adaptable knowledge discovery systems, representing progress in creating data-efficient methodologies for extracting useful insights when annotated data is limited.

摘要

背景

科学文献中的知识发现受到出版物数量不断增加以及广泛注释数据稀缺的阻碍。为应对信息过载的挑战,采用自动化的知识提取和处理方法至关重要。在监督水平和模型有效性之间找到恰当平衡构成了重大挑战。虽然监督技术通常能带来更好的性能,但它们存在需要标记数据这一主要缺点。这一要求劳动强度大、耗时,并且在探索新领域时会阻碍可扩展性。

方法与结果

在此背景下,我们的研究解决了在非结构化文本中识别生物医学实体(如疾病、蛋白质、药物)之间语义关系的挑战,同时尽量减少对监督的依赖。我们引入了一套基于依存树和注意力机制的无监督算法,并采用了一系列点式二元分类方法。从弱监督设置过渡到完全无监督设置,我们评估了这些方法从带有噪声标签的数据中学习的能力。在四个生物医学基准数据集上的评估探索了这些方法的有效性,证明了它们在实现对注释数据集依赖较少的可扩展知识发现系统方面的潜力。

结论

我们的方法解决了知识发现中的一个核心问题:在最小化监督的情况下平衡性能,这对于使模型适应不同且不断变化的领域至关重要。本研究还调查了在弱监督框架内使用点式二元分类技术进行知识发现的情况。通过逐步减少监督,我们评估了这些技术在处理噪声标签时的稳健性,揭示了它们从弱监督场景转向完全无监督场景的能力。全面的基准测试提供了对这些技术有效性的见解,考察了无监督方法如何可靠地捕捉生物医学文本中的复杂关系。这些结果为可扩展、适应性强的知识发现系统指明了一个令人鼓舞的方向,代表了在创建数据高效方法以在注释数据有限时提取有用见解方面取得的进展。

相似文献

1
Reduction of supervision for biomedical knowledge discovery.减少对生物医学知识发现的监督。
BMC Bioinformatics. 2025 Sep 1;26(1):225. doi: 10.1186/s12859-025-06187-0.
2
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
3
Short-Term Memory Impairment短期记忆障碍
4
Healthcare workers' informal uses of mobile phones and other mobile devices to support their work: a qualitative evidence synthesis.医护人员非正规使用手机和其他移动设备来支持工作:定性证据综合评价。
Cochrane Database Syst Rev. 2024 Aug 27;8(8):CD015705. doi: 10.1002/14651858.CD015705.pub2.
5
Effects of Supervised vs. Unsupervised Training Programs on Balance and Muscle Strength in Older Adults: A Systematic Review and Meta-Analysis.监督训练与非监督训练方案对老年人平衡和肌肉力量的影响:系统评价和荟萃分析。
Sports Med. 2017 Nov;47(11):2341-2361. doi: 10.1007/s40279-017-0747-6.
6
Implementation of link workers in primary care: Synopsis of findings from a realist evaluation.基层医疗中联络人员的实施:现实主义评价的结果概要
Health Soc Care Deliv Res. 2025 Jul;13(27):1-30. doi: 10.3310/KHGT9993.
7
Fabricating mice and dementia: opening up relations in multi-species research制造小鼠与痴呆症:开启多物种研究中的关联
8
An Unsupervised Learning Algorithm for the Automatic Classification of Coronary Artery Lesions.一种用于冠状动脉病变自动分类的无监督学习算法。
Cureus. 2025 Jul 24;17(7):e88638. doi: 10.7759/cureus.88638. eCollection 2025 Jul.
9
Survivor, family and professional experiences of psychosocial interventions for sexual abuse and violence: a qualitative evidence synthesis.性虐待和暴力的心理社会干预的幸存者、家庭和专业人员的经验:定性证据综合。
Cochrane Database Syst Rev. 2022 Oct 4;10(10):CD013648. doi: 10.1002/14651858.CD013648.pub2.
10
Autistic Students' Experiences of Employment and Employability Support while Studying at a UK University.自闭症学生在英国大学学习期间的就业经历及就业支持情况
Autism Adulthood. 2025 Apr 3;7(2):212-222. doi: 10.1089/aut.2024.0112. eCollection 2025 Apr.

本文引用的文献

1
A Study of Biomedical Relation Extraction Using GPT Models.一项使用GPT模型进行生物医学关系提取的研究。
AMIA Jt Summits Transl Sci Proc. 2024 May 31;2024:391-400. eCollection 2024.
2
An extensive benchmark study on biomedical text generation and mining with ChatGPT.一项关于使用ChatGPT进行生物医学文本生成和挖掘的广泛基准研究。
Bioinformatics. 2023 Sep 2;39(9). doi: 10.1093/bioinformatics/btad557.
3
Revisiting Relation Extraction in the era of Large Language Models.重访大语言模型时代的关系抽取
Proc Conf Assoc Comput Linguist Meet. 2023 Jul;2023:15566-15589. doi: 10.18653/v1/2023.acl-long.868.
4
Large language models in medicine.医学中的大型语言模型。
Nat Med. 2023 Aug;29(8):1930-1940. doi: 10.1038/s41591-023-02448-8. Epub 2023 Jul 17.
5
Fine-tuning large neural language models for biomedical natural language processing.针对生物医学自然语言处理对大型神经语言模型进行微调。
Patterns (N Y). 2023 Apr 14;4(4):100729. doi: 10.1016/j.patter.2023.100729.
6
Global prevalence of Rett syndrome: systematic review and meta-analysis.全球雷特综合征的患病率:系统评价和荟萃分析。
Syst Rev. 2023 Jan 16;12(1):5. doi: 10.1186/s13643-023-02169-6.
7
Neuropathology of Alzheimer's Disease.阿尔茨海默病的神经病理学。
Neurotherapeutics. 2022 Jan;19(1):173-185. doi: 10.1007/s13311-021-01146-y. Epub 2021 Nov 2.
8
Alzheimer's disease.阿尔茨海默病。
Lancet. 2021 Apr 24;397(10284):1577-1590. doi: 10.1016/S0140-6736(20)32205-4. Epub 2021 Mar 2.
9
Using drug descriptions and molecular structures for drug-drug interaction extraction from literature.从文献中提取药物-药物相互作用的药物描述和分子结构。
Bioinformatics. 2021 Jul 19;37(12):1739-1746. doi: 10.1093/bioinformatics/btaa907.
10
Emergent linguistic structure in artificial neural networks trained by self-supervision.自我监督训练的人工神经网络中的紧急语言结构。
Proc Natl Acad Sci U S A. 2020 Dec 1;117(48):30046-30054. doi: 10.1073/pnas.1907367117. Epub 2020 Jun 3.