Hsu Szeling, Qu Sue, Xu Yanji, Zhu Qian
Division of Rare Diseases Research Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Bethesda, USA.
Division of Preclinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Rockville, USA.
Proceedings (IEEE Int Conf Bioinformatics Biomed). 2022 Dec;2022:3260-3262. doi: 10.1109/bibm55620.2022.9994880. Epub 2023 Jan 2.
With the advances in science and technology, the number of research in rare diseases has dramatically increased over the past twenty years. Systematically accessing those research projects funded by NIH would allow us to assess the current status of research, and research gaps remain in this area. Consequently, new research might be inspired to bridge the gaps. We previously developed a knowledge graph to semantically represent NIH funded rare disease research projects by analyzing project titles. To expand the use of NIH funding data, in this study we extended the previous work in two folds, 1) we applied our self-developed NLP package named NormMap to identify rare disease related projects, 2) we semantically annotated project titles and abstracts with biomedical concepts in UMLS to illustrate the project aims. With such rich information extracted from NIH funding data via semantic annotation, an updated version of the knowledge graph will be developed to advance rare disease research as the next step.
随着科学技术的进步,在过去二十年里,罕见病研究的数量急剧增加。系统地获取由美国国立卫生研究院(NIH)资助的那些研究项目,将使我们能够评估研究的现状,以及该领域仍然存在的研究空白。因此,可能会激发新的研究来填补这些空白。我们之前通过分析项目标题开发了一个知识图谱,以语义方式表示由NIH资助的罕见病研究项目。为了扩大NIH资助数据的用途,在本研究中,我们将之前的工作扩展了两个方面:1)我们应用自主开发的名为NormMap的自然语言处理软件包来识别与罕见病相关的项目;2)我们用统一医学语言系统(UMLS)中的生物医学概念对项目标题和摘要进行语义标注,以阐明项目目标。通过语义标注从NIH资助数据中提取如此丰富的信息后,下一步将开发知识图谱的更新版本,以推进罕见病研究。