Department of Radiation Oncology, Seoul National University Hospital, Seoul, Korea.
Department of Radiation Oncology, College of Medicine, Seoul National University, Seoul, Korea.
BMC Med Inform Decis Mak. 2022 Oct 13;22(1):267. doi: 10.1186/s12911-022-02003-4.
Efficient exploration of knowledge for the treatment of recurrent glioblastoma (GBM) is critical for both clinicians and researchers. However, due to the large number of clinical trials and published articles, searching for this knowledge is very labor-intensive. In the current study, using natural language processing (NLP), we analyzed medical research corpora related to recurrent glioblastoma to find potential targets and treatments.
We fine-tuned the 'SAPBERT', which was pretrained on biomedical ontologies, to perform question/answering (QA) and name entity recognition (NER) tasks for medical corpora. The model was fine-tuned with the SQUAD2 dataset and multiple NER datasets designed for QA task and NER task, respectively. Corpora were collected by searching the terms "recurrent glioblastoma" and "drug target", published from 2000 to 2020 in the Web of science (N = 288 articles). Also, clinical trial corpora were collected from 'clinicaltrial.gov' using the searching term of 'recurrent glioblastoma" (N = 587 studies).
For the QA task, the model showed an F1 score of 0.79. For the NER task, the model showed F1 scores of 0.90 and 0.76 for drug and gene name recognition, respectively. When asked what the molecular targets were promising for recurrent glioblastoma, the model answered that RTK inhibitors or LPA-1 antagonists were promising. From collected clinical trials, the model summarized them in the order of bevacizumab, temozolomide, lomustine, and nivolumab. Based on published articles, the model found the many drug-gene pairs with the NER task, and we presented them with a circus plot and related summarization ( https://github.com/bigwiz83/NLP_rGBM ).
Using NLP deep learning models, we could explore potential targets and treatments based on medical research and clinical trial corpora. The knowledge found by the models may be used for treating recurrent glioblastoma.
高效探索复发性胶质母细胞瘤(GBM)的治疗知识,对临床医生和研究人员都至关重要。然而,由于临床试验和已发表文章数量众多,因此搜索这些知识非常耗费人力。在本研究中,我们使用自然语言处理(NLP)技术,分析与复发性 GBM 相关的医学研究文献,以寻找潜在的治疗靶点和治疗方法。
我们对在生物医学本体上进行预训练的 'SAPBERT' 进行微调,以对医学文献进行问答(QA)和命名实体识别(NER)任务。该模型使用 SQUAD2 数据集和分别为 QA 任务和 NER 任务设计的多个 NER 数据集进行微调。通过在 Web of Science 中搜索“复发性胶质母细胞瘤”和“药物靶点”等术语,从 2000 年至 2020 年收集文献(N=288 篇文章)。此外,还从“clinicaltrial.gov”中使用“复发性胶质母细胞瘤”的搜索词收集临床试验文献(N=587 项研究)。
对于 QA 任务,该模型的 F1 得分为 0.79。对于 NER 任务,该模型在药物和基因名称识别方面的 F1 得分分别为 0.90 和 0.76。当被问及哪些分子靶点对复发性 GBM 有治疗前景时,模型回答称 RTK 抑制剂或 LPA-1 拮抗剂有治疗前景。从收集到的临床试验中,模型按照贝伐单抗、替莫唑胺、洛莫司汀和纳武单抗的顺序对它们进行了总结。基于已发表的文章,该模型使用 NER 任务找到了许多药物-基因对,我们使用 circus plot 并提供了相关总结(https://github.com/bigwiz83/NLP_rGBM)。
我们可以使用 NLP 深度学习模型,基于医学研究和临床试验文献探索潜在的治疗靶点和治疗方法。模型发现的知识可能用于治疗复发性 GBM。