McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, 7000 Fannin St #600, Houston, TX 77030, United States.
Database (Oxford). 2024 Oct 23;2024. doi: 10.1093/database/baae103.
This manuscript presents PheNormGPT, a framework for extraction and normalization of key findings in clinical text. PheNormGPT relies on an innovative approach, leveraging large language models to extract key findings and phenotypic data in unstructured clinical text and map them to Human Phenotype Ontology concepts. It utilizes OpenAI's GPT-3.5 Turbo and GPT-4 models with fine-tuning and few-shot learning strategies, including a novel few-shot learning strategy for custom-tailored few-shot example selection per request. PheNormGPT was evaluated in the BioCreative VIII Track 3: Genetic Phenotype Extraction from Dysmorphology Physical Examination Entries shared task. PheNormGPT achieved an F1 score of 0.82 for standard matching and 0.72 for exact matching, securing first place for this shared task.
本文提出了 PheNormGPT,这是一个从临床文本中提取和规范化关键发现的框架。PheNormGPT 依赖于一种创新的方法,利用大型语言模型从非结构化的临床文本中提取关键发现和表型数据,并将其映射到人类表型本体概念。它使用 OpenAI 的 GPT-3.5 Turbo 和 GPT-4 模型进行微调,并采用了 few-shot 学习策略,包括一种新的针对每个请求定制的 few-shot 示例选择的 few-shot 学习策略。PheNormGPT 在 BioCreative VIII 第 3 轨道:从发育异常体格检查条目遗传表型提取共享任务中进行了评估。PheNormGPT 在标准匹配方面的 F1 得分为 0.82,在精确匹配方面的 F1 得分为 0.72,在这个共享任务中获得了第一名。