弥合肿瘤学中的数据差距：用于癌症治疗推荐的大语言模型与协同过滤

Bridging Data Gaps in Oncology: Large Language Models and Collaborative Filtering for Cancer Treatment Recommendations.

作者信息

Tang Tengjie, Li Angkai, Tan Xingye, Ji Qingli, Si Lu, Bao Le

机构信息

Department of Statistics, Pennsylvania State University, University Park, Pennsylvania, U.S.A.

Key laboratory of Carcinogenesis and Translational Research, Department of Melanoma and Sarcoma, Peking University, Cancer Hospital & Institute, Beijing 100142, China.

出版信息

medRxiv. 2025 Apr 7:2025.04.07.25325243. doi: 10.1101/2025.04.07.25325243.

DOI:10.1101/2025.04.07.25325243

PMID:40297440

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12036386/

Abstract

BACKGROUND

Patients with rare cancers face substantial challenges due to limited evidence-based treatment options, resulting from sparse clinical trials. Advances in large language models (LLMs) and recommendation algorithms offer new opportunities to utilize all clinical trial information to improve clinical decisions.

METHODS

We used LLM to systematically extract and standardize more than 100,000 cancer trials from ClinicalTrials.gov. Each trial was annotated using a customized scoring system reflecting cancer-treatment interactions based on clinical outcomes and trial attributes. Using this structured data set, we implemented three state-of-the-art collaborative filtering algorithms to recommend potentially effective treatments across different cancer types.

RESULTS

The LLM-driven data extraction process successfully generated a comprehensive and rigorously curated database from fragmented clinical trial information, covering 78 cancer types and 5,315 distinct interventions. Recommendation models demonstrated high predictive accuracy (cross-validated RMSE: 0.49-0.62) and identified clinically meaningful new treatments for melanoma, independently validated by oncology experts.

CONCLUSIONS

Our study establishes a proof of concept demonstrating that the combination of LLMs with sophisticated recommendation algorithms can systematically identify novel and clinically plausible cancer treatments. This integrated approach may accelerate the identification of effective therapies for rare cancers, ultimately improving patient outcomes by generating evidence-based treatment recommendations where traditional data sources remain limited.

摘要

背景

由于临床试验稀少，基于证据的治疗选择有限，罕见癌症患者面临巨大挑战。大语言模型（LLMs）和推荐算法的进展为利用所有临床试验信息改善临床决策提供了新机会。

方法

我们使用大语言模型从ClinicalTrials.gov系统地提取并标准化了超过10万项癌症试验。每项试验都使用一个定制的评分系统进行注释，该系统根据临床结果和试验属性反映癌症治疗的相互作用。利用这个结构化数据集，我们实施了三种最先进的协同过滤算法，以推荐不同癌症类型中潜在有效的治疗方法。

结果

由大语言模型驱动的数据提取过程成功地从碎片化的临床试验信息中生成了一个全面且经过严格整理的数据库，涵盖78种癌症类型和5315种不同的干预措施。推荐模型显示出较高的预测准确性（交叉验证均方根误差：0.49 - 0.62），并为黑色素瘤确定了具有临床意义的新治疗方法，经肿瘤学专家独立验证。

结论

我们的研究建立了一个概念验证，表明大语言模型与复杂的推荐算法相结合可以系统地识别新的且临床上合理的癌症治疗方法。这种综合方法可能会加速为罕见癌症确定有效治疗方法，最终通过在传统数据来源有限的情况下生成基于证据的治疗建议来改善患者预后。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/823f/12036386/42d819286766/nihpp-2025.04.07.25325243v1-f0001.jpg

相似文献

Bridging Data Gaps in Oncology: Large Language Models and Collaborative Filtering for Cancer Treatment Recommendations.弥合肿瘤学中的数据差距：用于癌症治疗推荐的大语言模型与协同过滤

medRxiv. 2025 Apr 7:2025.04.07.25325243. doi: 10.1101/2025.04.07.25325243.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

The future of Cochrane Neonatal.考克兰新生儿协作网的未来。

Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12.

Leveraging Large Language Models for Decision Support in Personalized Oncology.利用大型语言模型为个性化肿瘤学提供决策支持。

JAMA Netw Open. 2023 Nov 1;6(11):e2343689. doi: 10.1001/jamanetworkopen.2023.43689.

Assessment of decision-making with locally run and web-based large language models versus human board recommendations in otorhinolaryngology, head and neck surgery.在耳鼻喉科、头颈外科中，评估本地运行和基于网络的大语言模型与人类委员会建议的决策情况。

Eur Arch Otorhinolaryngol. 2025 Mar;282(3):1593-1607. doi: 10.1007/s00405-024-09153-3. Epub 2025 Jan 10.

The interaction of structured data using openEHR and large Language models for clinical decision support in prostate cancer.使用openEHR结构化数据与大语言模型在前列腺癌临床决策支持中的交互。

World J Urol. 2025 Jan 13;43(1):67. doi: 10.1007/s00345-024-05423-1.

Identifying Deprescribing Opportunities With Large Language Models in Older Adults: Retrospective Cohort Study.利用大语言模型识别老年人的药物停用机会：回顾性队列研究。

JMIR Aging. 2025 Apr 11;8:e69504. doi: 10.2196/69504.

Retrieval Augmented Therapy Suggestion for Molecular Tumor Boards: Algorithmic Development and Validation Study.分子肿瘤委员会的检索增强疗法建议：算法开发与验证研究

J Med Internet Res. 2025 Mar 5;27:e64364. doi: 10.2196/64364.

Large Language Models in Worldwide Medical Exams: Platform Development and Comprehensive Analysis.全球医学考试中的大语言模型：平台开发与综合分析

J Med Internet Res. 2024 Dec 27;26:e66114. doi: 10.2196/66114.

Large Language Model Applications for Health Information Extraction in Oncology: Scoping Review.用于肿瘤学健康信息提取的大语言模型应用：范围综述

JMIR Cancer. 2025 Mar 28;11:e65984. doi: 10.2196/65984.

本文引用的文献

Matching patients to clinical trials with large language models.利用大型语言模型为患者匹配临床试验。

Nat Commun. 2024 Nov 18;15(1):9074. doi: 10.1038/s41467-024-53081-z.

Large Language Model Influence on Diagnostic Reasoning: A Randomized Clinical Trial.大语言模型对诊断推理的影响：一项随机临床试验。

JAMA Netw Open. 2024 Oct 1;7(10):e2440969. doi: 10.1001/jamanetworkopen.2024.40969.

Efficacy of PARP inhibitor therapy after targeted BRAF/MEK failure in advanced melanoma.BRAF/MEK靶向治疗失败后，PARP抑制剂治疗晚期黑色素瘤的疗效。

NPJ Precis Oncol. 2024 Sep 5;8(1):187. doi: 10.1038/s41698-024-00684-w.

Transforming clinical trials: the emerging roles of large language models.变革临床试验：大语言模型的新兴作用

Transl Clin Pharmacol. 2023 Sep;31(3):131-138. doi: 10.12793/tcp.2023.31.e16. Epub 2023 Sep 19.

Assessing the Utility of ChatGPT Throughout the Entire Clinical Workflow: Development and Usability Study.评估 ChatGPT 在整个临床工作流程中的效用：开发和可用性研究。

J Med Internet Res. 2023 Aug 22;25:e48659. doi: 10.2196/48659.

ChatDoctor: A Medical Chat Model Fine-Tuned on a Large Language Model Meta-AI (LLaMA) Using Medical Domain Knowledge.ChatDoctor：一种基于医学领域知识对大型语言模型Meta-AI（LLaMA）进行微调的医学聊天模型。

Cureus. 2023 Jun 24;15(6):e40895. doi: 10.7759/cureus.40895. eCollection 2023 Jun.

Utility of ChatGPT in Clinical Practice.ChatGPT 在临床实践中的应用。

J Med Internet Res. 2023 Jun 28;25:e48568. doi: 10.2196/48568.

Using ChatGPT to evaluate cancer myths and misconceptions: artificial intelligence and cancer information.利用 ChatGPT 评估癌症谣言和误解：人工智能与癌症信息。

JNCI Cancer Spectr. 2023 Mar 1;7(2). doi: 10.1093/jncics/pkad015.

Lenvatinib or anti-VEGF in combination with anti-PD-1 differentially augments antitumor activity in melanoma.仑伐替尼或抗 VEGF 联合抗 PD-1 治疗可显著增强黑色素瘤的抗肿瘤活性。

JCI Insight. 2023 Apr 10;8(7):e157347. doi: 10.1172/jci.insight.157347.

ChatGPT: the future of discharge summaries?ChatGPT：出院小结的未来？

Lancet Digit Health. 2023 Mar;5(3):e107-e108. doi: 10.1016/S2589-7500(23)00021-3. Epub 2023 Feb 6.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

弥合肿瘤学中的数据差距：用于癌症治疗推荐的大语言模型与协同过滤

Bridging Data Gaps in Oncology: Large Language Models and Collaborative Filtering for Cancer Treatment Recommendations.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献