Edelman Brice, Skolnick Jeffrey
Georgia Tech Center for the Study of Systems Biology, Atlanta, GA, USA.
BMC Bioinformatics. 2025 May 28;26(1):140. doi: 10.1186/s12859-025-06159-4.
The exponential growth of scientific publications poses a formidable challenge for researchers seeking to validate emerging hypotheses or synthesize existing evidence. In this paper, we introduce Valsci, an open-source, self-hostable utility that automates large-batch scientific claim verification using any OpenAI-compatible large language model. Valsci unites retrieval-augmented generation with structured bibliometric scoring and chain-of-thought prompting, enabling users to efficiently search, evaluate, and summarize evidence from the Semantic Scholar database and other academic sources. Unlike conventional standalone LLMs, which often suffer from hallucinations and unreliable citations, Valsci grounds its analyses in verifiable published findings. A guided prompt-flow approach is employed to generate query expansions, retrieve relevant excerpts, and synthesize coherent, evidence-based reports.
Preliminary evaluations across claims from the SciFact benchmark dataset reveal that Valsci significantly outperforms base GPT-4o outputs in citation hallucination rate while maintaining a low misclassification rate. The system is highly scalable, processing hundreds of claims per hour through asynchronous parallelization.
By providing an open and transparent platform for large-batch literature verification, Valsci substantially lowers the barrier to comprehensive evidence-based reviews and fosters a more reproducible research ecosystem.
科学出版物的指数级增长给试图验证新出现的假设或综合现有证据的研究人员带来了巨大挑战。在本文中,我们介绍了Valsci,这是一种开源的、可自我托管的实用工具,它使用任何与OpenAI兼容的大语言模型来自动进行大批量科学论断验证。Valsci将检索增强生成与结构化文献计量评分和思维链提示相结合,使用户能够有效地从语义学者数据库和其他学术来源中搜索、评估和总结证据。与传统的独立大语言模型不同,后者经常存在幻觉和不可靠引用的问题,Valsci的分析基于可验证的已发表研究结果。采用一种有指导的提示流方法来生成查询扩展、检索相关摘录并合成连贯的、基于证据的报告。
对SciFact基准数据集的论断进行的初步评估表明,Valsci在引用幻觉率方面显著优于基础GPT-4o输出,同时保持较低的错误分类率。该系统具有高度可扩展性,通过异步并行化每小时可处理数百个论断。
通过为大批量文献验证提供一个开放和透明的平台,Valsci大大降低了进行全面的基于证据的综述的障碍,并促进了一个更具可重复性的研究生态系统。