Wickramarachchi Anuradha, Tonni Shakila, Majumdar Sonali, Karimi Sarvnaz, Kõks Sulev, Hosking Brendan, Rambla Jordi, Twine Natalie A, Jain Yatish, Bauer Denis C
Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation, Adelaide, SA 5000, Australia.
Data61, Commonwealth Scientific and Industrial Research Organisation, Sydney, NSW 2015, Australia.
Bioinformatics. 2025 Mar 4;41(3). doi: 10.1093/bioinformatics/btaf079.
Enabling clinicians and researchers to directly interact with global genomic data resources by removing technological barriers is vital for medical genomics. AskBeacon enables large language models (LLMs) to be applied to securely shared cohorts via the Global Alliance for Genomics and Health Beacon protocol. By simply "asking" Beacon, actionable insights can be gained, analyzed, and made publication-ready.
In the Parkinson's Progression Markers Initiative (PPMI), we use natural language to ask whether the sex-differences observed in Parkinson's disease are due to X-linked or autosomal markers. AskBeacon returns a publication-ready visualization showing that for PPMI the autosomal marker occurred 1.4 times more often in males with Parkinson's disease than females, compared to no differences for the X-linked marker. We evaluate commercial and open-weight LLM models, as well as different architectures to identify the best strategy for translating research questions to Beacon queries. AskBeacon implements extensive safety guardrails to ensure that genomic data is not exposed to the LLM directly, and that generated code for data extraction, analysis and visualization process is sanitized and hallucination resistant, so data cannot be leaked or falsified.
AskBeacon is available at https://github.com/aehrc/AskBeacon.
消除技术障碍,使临床医生和研究人员能够直接与全球基因组数据资源进行交互,这对医学基因组学至关重要。AskBeacon使大语言模型(LLMs)能够通过全球基因组学与健康联盟信标协议应用于安全共享的队列。通过简单地“询问”信标,就可以获得可操作的见解、进行分析并使其达到可发表的状态。
在帕金森病进展标志物倡议(PPMI)中,我们使用自然语言询问帕金森病中观察到的性别差异是由X连锁还是常染色体标志物引起的。AskBeacon返回一个可发表的可视化结果,显示对于PPMI,帕金森病男性患者中常染色体标志物出现的频率比女性高1.4倍,而X连锁标志物则没有差异。我们评估了商业和开源的大语言模型以及不同的架构,以确定将研究问题转化为信标查询的最佳策略。AskBeacon实施了广泛的安全防护措施,以确保基因组数据不会直接暴露给大语言模型,并且用于数据提取、分析和可视化过程的生成代码经过清理且抗幻觉,因此数据不会泄露或伪造。
AskBeacon可在https://github.com/aehrc/AskBeacon上获取。