Biomedical Informatics and Data Science Section, Johns Hopkins University School of Medicine, Baltimore, MD.
Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD.
Stud Health Technol Inform. 2022 Jan 14;289:18-21. doi: 10.3233/SHTI210848.
Processing unstructured clinical texts is often necessary to support certain tasks in biomedicine, such as matching patients to clinical trials. Among other methods, domain-specific language models have been built to utilize free-text information. This study evaluated the performance of Bidirectional Encoder Representations from Transformers (BERT) models in assessing the similarity between clinical trial texts. We compared an unstructured aggregated summary of clinical trials reviewed at the Johns Hopkins Molecular Tumor Board with the ClinicalTrials.gov records, focusing on the titles and eligibility criteria. Seven pretrained BERT-Based models were used in our analysis. Of the six biomedical-domain-specific models, only SciBERT outperformed the original BERT model by accurately assigning higher similarity scores to matched than mismatched trials. This finding is promising and shows that BERT and, likely, other language models may support patient-trial matching.
处理非结构化的临床文本对于支持生物医学中的某些任务是必要的,例如将患者与临床试验相匹配。除了其他方法之外,还构建了特定于领域的语言模型来利用自由文本信息。本研究评估了基于双向编码器表示的转换器(BERT)模型在评估临床试验文本之间相似性方面的性能。我们比较了约翰霍普金斯分子肿瘤委员会审查的临床试验的非结构化聚合摘要与 ClinicalTrials.gov 记录,重点是标题和资格标准。我们的分析中使用了七种预先训练的基于 BERT 的模型。在六个生物医学领域特定模型中,只有 SciBERT 通过准确地为匹配的试验分配更高的相似性分数,从而优于原始 BERT 模型。这一发现很有前景,表明 BERT 可能还有其他语言模型可以支持患者与试验的匹配。