Tekumalla Ramya, Banda Juan M
Mercer University, Atlanta, GA, USA.
Stanford Health Care, Stanford, CA, USA.
Genomics Inform. 2024 Oct 31;22(1):21. doi: 10.1186/s44342-024-00023-2.
Electronic phenotyping involves a detailed analysis of both structured and unstructured data, employing rule-based methods, machine learning, natural language processing, and hybrid approaches. Currently, the development of accurate phenotype definitions demands extensive literature reviews and clinical experts, rendering the process time-consuming and inherently unscalable. Large language models offer a promising avenue for automating phenotype definition extraction but come with significant drawbacks, including reliability issues, the tendency to generate non-factual data ("hallucinations"), misleading results, and potential harm. To address these challenges, our study embarked on two key objectives: (1) defining a standard evaluation set to ensure large language models outputs are both useful and reliable and (2) evaluating various prompting approaches to extract phenotype definitions from large language models, assessing them with our established evaluation task. Our findings reveal promising results that still require human evaluation and validation for this task. However, enhanced phenotype extraction is possible, reducing the amount of time spent in literature review and evaluation.
电子表型分析涉及对结构化和非结构化数据进行详细分析,采用基于规则的方法、机器学习、自然语言处理以及混合方法。目前,准确的表型定义的开发需要广泛的文献综述和临床专家参与,这使得该过程既耗时又本质上难以扩展。大语言模型为自动提取表型定义提供了一条有前景的途径,但也存在重大缺陷,包括可靠性问题、生成非事实数据(“幻觉”)的倾向、误导性结果以及潜在危害。为应对这些挑战,我们的研究着手实现两个关键目标:(1)定义一个标准评估集,以确保大语言模型的输出既有用又可靠;(2)评估各种提示方法,以便从大语言模型中提取表型定义,并通过我们既定的评估任务对其进行评估。我们的研究结果显示出有前景的成果,但对于这项任务仍需要人工评估和验证。不过,增强的表型提取是可行的,这减少了在文献综述和评估中花费的时间。