Xu Qidi, Liu Xiaozhong, Jiang Xiaoqian, Kim Yejin
McWilliams School of Biomedical Informatics, UTHealth Houston, Houston, TX, 77030.
Computer Science and Data Science, Worcester Polytechnic Institute, Worcester, MA, 01609.
medRxiv. 2024 Dec 12:2024.12.10.24318800. doi: 10.1101/2024.12.10.24318800.
This study aims to develop an AI-driven framework that leverages large language models (LLMs) to simulate scientific reasoning and peer review to predict efficacious combinatorial therapy when data-driven prediction is infeasible.
Our proposed framework achieved a significantly higher accuracy (0.74) than traditional knowledge-based prediction (0.52). An ablation study highlighted the importance of high quality few-shot examples, external knowledge integration, self-consistency, and review within the framework. The external validation with private experimental data yielded an accuracy of 0.82, further confirming the framework's ability to generate high-quality hypotheses in biological inference tasks. Our framework offers an automated knowledge-driven hypothesis generation approach when data-driven prediction is not a viable option.
Our source code and data are available at https://github.com/QidiXu96/Coated-LLM.
本研究旨在开发一个由人工智能驱动的框架,该框架利用大语言模型(LLMs)来模拟科学推理和同行评审,以便在数据驱动的预测不可行时预测有效的联合治疗方案。
我们提出的框架实现了显著更高的准确率(0.74),高于传统的基于知识的预测(0.52)。一项消融研究突出了高质量的少样本示例、外部知识整合、自一致性以及框架内评审的重要性。使用私有实验数据进行的外部验证产生了0.82的准确率,进一步证实了该框架在生物推理任务中生成高质量假设的能力。当数据驱动的预测不是一个可行选项时,我们的框架提供了一种自动化的知识驱动的假设生成方法。
我们的源代码和数据可在https://github.com/QidiXu96/Coated-LLM获取。