He Ting, Kreimeyer Kory, Najjar Mimi, Spiker Jonathan, Fatteh Maria, Anagnostou Valsamo, Botsis Taxiarchis
Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins School of Medicine, Baltimore, MD.
Division of Quantitative Sciences, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins School of Medicine, Baltimore, MD.
AMIA Annu Symp Proc. 2025 May 22;2024:513-522. eCollection 2024.
The delivery of effective targeted therapies requires comprehensive analyses of the molecular profiling of tumors and matching with clinical phenotypes in the context of existing knowledge described in biomedical literature, registries, and knowledge bases. We evaluated the performance of natural language processing (NLP) approaches in supporting knowledge retrieval and synthesis from the biomedical literature. We tested PubTator 3.0, Bidirectional Encoder Representations from Transformers (BERT), and Large Language Models (LLMs) and evaluated their ability to support named entity recognition (NER) and relation extraction (RE) from biomedical texts. PubTator 3.0 and the BioBERT model performed best in the NER task (best F1-score 0.93 and 0.89, respectively), while BioBERT outperformed all other solutions in the RE task (best F1-score 0.79) and a specific use case it was applied to by recognizing nearly all entity mentions and most of the relations. Our findings support the use of AI-assisted approaches in facilitating precision oncology decision-making.
有效的靶向治疗的实施需要对肿瘤的分子特征进行全面分析,并在生物医学文献、登记处和知识库中描述的现有知识背景下与临床表型进行匹配。我们评估了自然语言处理(NLP)方法在支持从生物医学文献中检索和综合知识方面的性能。我们测试了PubTator 3.0、来自变换器的双向编码器表征(BERT)和大语言模型(LLMs),并评估了它们从生物医学文本中支持命名实体识别(NER)和关系提取(RE)的能力。PubTator 3.0和BioBERT模型在NER任务中表现最佳(最佳F1分数分别为0.93和0.89),而BioBERT在RE任务中优于所有其他解决方案(最佳F1分数为0.79),并且通过识别几乎所有实体提及和大多数关系,在其应用的一个特定用例中表现出色。我们的研究结果支持使用人工智能辅助方法来促进精准肿瘤学决策。