Ma Hetong, Shen Liu, Wang Jiayang, Wang Shilong, Wang Min, Wang Meng, Li Zixiao, Li Jiao
Intelligent Computing Department, Institute of Medical Information & Library, Chinese Academy of Medical Sciences/Peking Union Medical College, No. 3 Yabao Road, Beijing 100020, China.
Intelligent Computing Department, Chinese Academy of Medical Sciences/Peking Union Medical College, No. 3 Yabao Road, Beijing 100020, China.
Database (Oxford). 2024 Dec 5;2024. doi: 10.1093/database/baae117.
Atherosclerotic cerebrovascular disease could result in a great number of deaths and disabilities. However, it did not acquire enough attention. Less information, statistics, or data on the disease has been revealed. Thus, no systematic concept datasets were released to help clinicians clarify the scope, assist research, and offer maximized value. This study aimed to develop a cross-lingual atherosclerotic cerebrovascular disease ontology; describe the workflow, schema, hierarchical structure, and the highlighted content; design a brand-new rehabilitation ontology; implement the ontology evaluation; and illustrate the application scenarios in real-world scenarios. We implemented nine steps based on the Ontology Development 101 methodologies combined with expert opinions. The ontology included collection and specification of clinical requirements, background investigation and knowledge acquisition, ontology selection and reuse, scope identification, schema definition, concept extraction, concept extension, ontology verification, and ontology evaluation. We evaluated the proposed ontology in the literature classification task. The current ontology included 10 top-level classes, respectively, clinical manifestation, comorbidity, complication, diagnosis, model of atherosclerotic cerebrovascular disease, pathogenesis, prevention, rehabilitation, risk factor, and treatment. There are 1715 concepts in the 11-level ontology, covering 4588 Chinese terms, 6617 English terms, and 972 definitions. The ontology could be applied in real-world scenarios such as information retrieval, new expression discovery, named entity recognition, and knowledge fusion, and the use case proved that it could offer satisfying support to related medical scenarios. The ontology was proven to be useful in text classification tasks, and the weight-F1 score could reach >80% combined with the pretrained model. The proposed ontology provided a clear set of cross-lingual concepts and terms with an explicit hierarchical structure, helping scientific researchers to quickly retrieve relevant medical literature, assisting data scientists to efficiently identify relevant contents in electronic health records, and providing a clear domain framework for academic reference. Database URL: https://bioportal.bioontology.org/ontologies/ACVD_ONTOLOGY.
动脉粥样硬化性脑血管疾病可导致大量死亡和残疾。然而,它并未得到足够的关注。关于该疾病的信息、统计数据或资料披露较少。因此,尚未发布系统的概念数据集来帮助临床医生明确范围、辅助研究并提供最大价值。本研究旨在开发一种跨语言的动脉粥样硬化性脑血管疾病本体;描述工作流程、模式、层次结构和重点内容;设计全新的康复本体;实施本体评估;并说明在实际场景中的应用情况。我们基于本体开发101方法并结合专家意见实施了九个步骤。该本体包括临床需求的收集与规范、背景调查与知识获取、本体选择与重用、范围确定、模式定义、概念提取、概念扩展、本体验证和本体评估。我们在文献分类任务中对所提出的本体进行了评估。当前本体分别包括10个顶级类别,即临床表现、合并症、并发症、诊断、动脉粥样硬化性脑血管疾病模型、发病机制、预防、康复、危险因素和治疗。在这个11层的本体中有1715个概念,涵盖4588个中文术语、6617个英文术语和972条定义。该本体可应用于信息检索、新表达发现、命名实体识别和知识融合等实际场景,用例证明它可为相关医疗场景提供满意的支持。该本体在文本分类任务中被证明是有用的,结合预训练模型,加权F1分数可达到>80%。所提出的本体提供了一组清晰的跨语言概念和术语以及明确的层次结构,有助于科研人员快速检索相关医学文献,协助数据科学家高效识别电子健康记录中的相关内容,并为学术参考提供清晰的领域框架。数据库网址:https://bioportal.bioontology.org/ontologies/ACVD_ONTOLOGY