Suppr超能文献

迈向使用大语言模型进行自动化表型定义提取

Towards automated phenotype definition extraction using large language models.

作者信息

Tekumalla Ramya, Banda Juan M

机构信息

Mercer University, Atlanta, GA, USA.

Stanford Health Care, Stanford, CA, USA.

出版信息

Genomics Inform. 2024 Oct 31;22(1):21. doi: 10.1186/s44342-024-00023-2.

Abstract

Electronic phenotyping involves a detailed analysis of both structured and unstructured data, employing rule-based methods, machine learning, natural language processing, and hybrid approaches. Currently, the development of accurate phenotype definitions demands extensive literature reviews and clinical experts, rendering the process time-consuming and inherently unscalable. Large language models offer a promising avenue for automating phenotype definition extraction but come with significant drawbacks, including reliability issues, the tendency to generate non-factual data ("hallucinations"), misleading results, and potential harm. To address these challenges, our study embarked on two key objectives: (1) defining a standard evaluation set to ensure large language models outputs are both useful and reliable and (2) evaluating various prompting approaches to extract phenotype definitions from large language models, assessing them with our established evaluation task. Our findings reveal promising results that still require human evaluation and validation for this task. However, enhanced phenotype extraction is possible, reducing the amount of time spent in literature review and evaluation.

摘要

电子表型分析涉及对结构化和非结构化数据进行详细分析,采用基于规则的方法、机器学习、自然语言处理以及混合方法。目前,准确的表型定义的开发需要广泛的文献综述和临床专家参与,这使得该过程既耗时又本质上难以扩展。大语言模型为自动提取表型定义提供了一条有前景的途径,但也存在重大缺陷,包括可靠性问题、生成非事实数据(“幻觉”)的倾向、误导性结果以及潜在危害。为应对这些挑战,我们的研究着手实现两个关键目标:(1)定义一个标准评估集,以确保大语言模型的输出既有用又可靠;(2)评估各种提示方法,以便从大语言模型中提取表型定义,并通过我们既定的评估任务对其进行评估。我们的研究结果显示出有前景的成果,但对于这项任务仍需要人工评估和验证。不过,增强的表型提取是可行的,这减少了在文献综述和评估中花费的时间。

相似文献

8
Identifying and Extracting Rare Diseases and Their Phenotypes with Large Language Models.使用大语言模型识别和提取罕见疾病及其表型
J Healthc Inform Res. 2024 Jan 5;8(2):438-461. doi: 10.1007/s41666-023-00155-0. eCollection 2024 Jun.

本文引用的文献

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验