Niyonkuru Enock, Caufield J Harry, Carmody Leigh C, Gargano Michael A, Toro Sabrina, Whetzel Patricia L, Blau Hannah, Soto Gomez Mauricio, Casiraghi Elena, Chimirri Leonardo, Reese Justin T, Valentini Giorgio, Haendel Melissa A, Mungall Christopher J, Robinson Peter N
Trinity College, Hartford, CT 06106, United States.
The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, United States.
Bioinform Adv. 2025 Jun 12;5(1):vbaf141. doi: 10.1093/bioadv/vbaf141. eCollection 2025.
Structured representations of clinical data can support computational analysis of individuals and cohorts, and ontologies representing disease entities and phenotypic abnormalities are now commonly used for translational research. The Medical Action Ontology (MAxO) provides a computational representation of treatments and other actions taken for clinical management. Currently, manual biocuration is used to annotate MAxO terms to rare diseases. However, it is challenging to scale manual curation to comprehensively capture information about medical actions for the more than 10 000 rare diseases.
We present AutoMAxO, a semi-automated workflow that leverages Large Language Models (LLMs) to streamline MAxO biocuration. AutoMAxO first uses LLMs to retrieve candidate curations from abstracts of relevant publications. Next, the candidate curations are matched to ontology terms from MAxO, Human Phenotype Ontology (HPO), and MONDO disease ontology via a combination of LLMs and post-processing techniques. Finally, the matched terms are presented in a structured form to a human curator for approval. We used this approach to process abstracts related to 37 rare genetic diseases and identified 958 novel treatment annotations that were transferred to the MAxO annotation dataset.
AutoMAxO is a Python package freely available at https://github.com/monarch-initiative/automaxo.
临床数据的结构化表示可以支持对个体和队列的计算分析,并且代表疾病实体和表型异常的本体现在常用于转化研究。医学行动本体(MAxO)提供了用于临床管理的治疗和其他行动的计算表示。目前,人工生物编目用于将MAxO术语注释到罕见病。然而,将人工编目扩展以全面捕获超过10000种罕见病的医疗行动信息具有挑战性。
我们展示了AutoMAxO,这是一种利用大语言模型(LLMs)来简化MAxO生物编目的半自动化工作流程。AutoMAxO首先使用LLMs从相关出版物的摘要中检索候选编目。接下来,通过LLMs和后处理技术的组合,将候选编目与来自MAxO、人类表型本体(HPO)和MONDO疾病本体的本体术语进行匹配。最后,将匹配的术语以结构化形式呈现给人工编目员以供批准。我们使用这种方法处理了与37种罕见遗传病相关的摘要,并识别出958条新的治疗注释,这些注释被转移到MAxO注释数据集中。
AutoMAxO是一个Python包,可在https://github.com/monarch-initiative/automaxo上免费获取。