Niyonkuru Enock, Caufield J Harry, Carmody Leigh C, Gargano Michael A, Toro Sabrina, Whetzel Patricia L, Blau Hannah, Gomez Mauricio Soto, Casiraghi Elena, Chimirri Leonardo, Reese Justin T, Valentini Giorgio, Haendel Melissa A, Mungall Christopher J, Robinson Peter N
Trinity College, Hartford, CT, USA.
The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA.
medRxiv. 2024 Aug 22:2024.08.22.24310814. doi: 10.1101/2024.08.22.24310814.
Structured representations of clinical data can support computational analysis of individuals and cohorts, and ontologies representing disease entities and phenotypic abnormalities are now commonly used for translational research. The Medical Action Ontology (MAxO) provides a computational representation of treatments and other actions taken for the clinical management of patients. Currently, manual biocuration is used to assign MAxO terms to rare diseases, enabling clinical management of rare diseases to be described computationally for use in clinical decision support and mechanism discovery. However, it is challenging to scale manual curation to comprehensively capture information about medical actions for the more than 10,000 rare diseases. We present AutoMAxO, a semi-automated workflow that leverages Large Language Models (LLMs) to streamline MAxO biocuration for rare diseases. AutoMAxO first uses LLMs to retrieve candidate curations from abstracts of relevant publications. Next, the candidate curations are matched to ontology terms from MAxO, Human Phenotype Ontology (HPO), and MONDO disease ontology via a combination of LLMs and post-processing techniques. Finally, the matched terms are presented in a structured form to a human curator for approval. We used this approach to process 4,918 unique medical abstracts and identified annotations for 21 rare genetic diseases, we extracted 18,631 candidate disease-treatment curations, 538 of which were confirmed and transferred to the MAxO annotation dataset. The results of this project underscore the potential of generative AI to accelerate precision medicine by enabling a robust and comprehensive curation of the primary literature to represent information about diseases and procedures in a structured fashion. Although we focused on MAxO in this project, similar approaches could be taken for other biomedical curation tasks.
临床数据的结构化表示可以支持对个体和队列的计算分析,而代表疾病实体和表型异常的本体现在常用于转化研究。医学行动本体(MAxO)提供了用于患者临床管理的治疗和其他行动的计算表示。目前,人工生物编目用于为罕见病分配MAxO术语,从而能够以计算方式描述罕见病的临床管理,以用于临床决策支持和机制发现。然而,将人工编目扩展到全面捕获10000多种罕见病的医疗行动信息具有挑战性。我们提出了AutoMAxO,这是一种半自动化工作流程,利用大语言模型(LLMs)简化针对罕见病的MAxO生物编目。AutoMAxO首先使用LLMs从相关出版物的摘要中检索候选编目。接下来,通过LLMs和后处理技术的组合,将候选编目与来自MAxO、人类表型本体(HPO)和MONDO疾病本体的本体术语进行匹配。最后,将匹配的术语以结构化形式呈现给人工编目员进行审核。我们使用这种方法处理了4918篇独特的医学摘要,并为21种罕见遗传病确定了注释,我们提取了18631条候选疾病-治疗编目,其中538条得到确认并转移到MAxO注释数据集。该项目的结果强调了生成式人工智能通过对原始文献进行强大而全面的编目,以结构化方式表示疾病和程序信息来加速精准医学的潜力。尽管我们在这个项目中专注于MAxO,但其他生物医学编目任务也可以采用类似的方法。