Tiwari Krishna, Matthews Lisa, May Bruce, Shamovsky Veronica, Orlic-Milacic Marija, Rothfels Karen, Ragueneau Eliot, Gong Chuqiao, Stephan Ralf, Li Nancy, Wu Guanming, Stein Lincoln, D'Eustachio Peter, Hermjakob Henning
European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK.
Open Targets, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK.
bioRxiv. 2023 Nov 8:2023.11.08.566195. doi: 10.1101/2023.11.08.566195.
Appreciating the rapid advancement and ubiquity of generative AI, particularly ChatGPT, a chatbot using large language models like GPT, we endeavour to explore the potential application of ChatGPT in the data collection and annotation stages within the Reactome curation process. This exploration aimed to create an automated or semi-automated framework to mitigate the extensive manual effort traditionally required for gathering and annotating information pertaining to biological pathways, adopting a Reactome "reaction-centric" approach. In this pilot study, we used ChatGPT/GPT4 to address gaps in the pathway annotation and enrichment in parallel with the conventional manual curation process. This approach facilitated a comparative analysis, where we assessed the outputs generated by ChatGPT against manually extracted information. The primary objective of this comparison was to ascertain the efficiency of integrating ChatGPT or other large language models into the Reactome curation workflow and helping plan our annotation pipeline, ultimately improving our protein-to-pathway association in a reliable and automated or semi-automated way. In the process, we identified some promising capabilities and inherent challenges associated with the utilisation of ChatGPT/GPT4 in general and also specifically in the context of Reactome curation processes. We describe approaches and tools for refining the output given by ChatGPT/GPT4 that aid in generating more accurate and detailed output.
鉴于生成式人工智能,尤其是ChatGPT(一种使用GPT等大语言模型的聊天机器人)的迅速发展和广泛应用,我们致力于探索ChatGPT在Reactome编目过程中数据收集和注释阶段的潜在应用。这一探索旨在创建一个自动化或半自动化框架,以减轻传统上收集和注释与生物途径相关信息所需的大量人工工作,采用Reactome“以反应为中心”的方法。在这项试点研究中,我们使用ChatGPT/GPT4来填补途径注释和富集方面的空白,同时进行传统的人工编目过程。这种方法有助于进行比较分析,即我们将ChatGPT生成的输出与人工提取的信息进行评估。这种比较的主要目的是确定将ChatGPT或其他大语言模型集成到Reactome编目工作流程中的效率,并帮助规划我们的注释管道,最终以可靠且自动化或半自动化的方式改善我们的蛋白质与途径关联。在此过程中,我们确定了与一般使用ChatGPT/GPT4相关的一些有前景的能力和固有挑战,特别是在Reactome编目过程的背景下。我们描述了用于改进ChatGPT/GPT4给出的输出的方法和工具,这些方法和工具有助于生成更准确、更详细的输出。