Nagarajan Vijayaraj, Shi Guangpu, Horai Reiko, Yu Cheng-Rong, Gopalakrishnan Jaanam, Yadav Manoj, Liew Michael H, Gentilucci Calla, Caspi Rachel R
Laboratory of Immunology, National Eye Institute, NIH, Bethesda 20892, USA.
Molecular Immunology Section, National Eye Institute, NIH, Bethesda 20892, USA.
bioRxiv. 2025 Mar 10:2025.03.06.640921. doi: 10.1101/2025.03.06.640921.
IAN is an R package that addresses the challenge of integrating, analyzing and interpreting high-throughput "omics" data, using a multi-agent artificial intelligence (AI) system. IAN leverages popular pathway and regulatory datasets (KEGG, WikiPathways, Reactome, GO, ChEA) and the STRING database for protein-protein interactions to perform standard enrichment analysis. The individual enrichment results are then used to generate insightful summaries, for each of the datasets, using a large language model (LLM) through a multi-agent architecture. These summaries are then contextually integrated and interpreted by the LLM, guided by carefully engineered prompts and grounding instructions, to provide insightful explanations, system overview, key regulators, novel observations etc. We demonstrate IAN's potential to facilitate biological discovery from complex omics data, by reanalyzing two already published data and evaluating the results. We also show remarkable performance of IAN, in terms of avoiding hallucination. IAN package, along with installation instructions and example usage, is available on https://github.com/NIH-NEI/IAN.
IAN是一个R软件包,它利用多智能体人工智能(AI)系统来应对整合、分析和解释高通量“组学”数据的挑战。IAN利用流行的通路和调控数据集(KEGG、WikiPathways、Reactome、GO、ChEA)以及用于蛋白质-蛋白质相互作用的STRING数据库来进行标准富集分析。然后,通过多智能体架构,使用大语言模型(LLM)为每个数据集生成有见地的总结,这些总结来自各个富集结果。然后,在精心设计的提示和基础指令的指导下,由LLM对这些总结进行上下文整合和解释,以提供有见地的解释、系统概述、关键调控因子、新发现等。我们通过重新分析两个已发表的数据并评估结果,展示了IAN在促进从复杂组学数据中进行生物学发现方面的潜力。我们还展示了IAN在避免幻觉方面的卓越性能。IAN软件包以及安装说明和示例用法可在https://github.com/NIH-NEI/IAN上获取。