Hishiki Teruyoshi, Ogasawara Osamu, Tsuruoka Yoshimasa, Okubo Kousaku
Biological Information Research Center, National Institute of Advanced Industrial Science and Technology.
In Silico Biol. 2004;4(1):31-54. Epub 2003 Dec 28.
As a first step toward the quantitative comparison of clinical features of diseases, we indexed the text descriptions in the Clinical Synopsis section of the Online Mendelian Inheritance in Man (OMIM) with concepts for the body parts, organs, and tissues contained in the Metathesaurus of the Unified Medical Language System (UMLS). We also indexed the text with the diseases and disorders having links to body parts specified in the thesaurus. The vocabulary size was approximately 177,540 representations for 81,435 concepts, and 2,161 concepts were indexed to 3,779 OMIM entries. The indexed concepts included 134 concepts for the noun forms of anatomical concepts and 985 indexed concepts for diseases and disorders that were linked to 132 and 408 anatomical concepts, respectively. We report herein that the retrieval of OMIM entries for diseases affecting specific organs can be made more comprehensive through the anatomical concepts indexed to the Clinical Synopsis or linked to the indexed concepts, as compared to simply matching organ names to the Clinical Synopsis text. The recall and precision of identifying relevant body parts in the Clinical Synopsis were calculated as 78% and 92.5%, respectively, based on random sampling. The examination of the unidentified body parts due to lack of indexed diseases and disorders showed that although most of the concepts for diseases and disorders were contained in the Metathesaurus, their relations to body parts were not. The indexing result proved the effectiveness of the Metathesaurus as a resource for the identification of concepts indicating body parts, diseases, and disorders.
作为对疾病临床特征进行定量比较的第一步,我们使用统一医学语言系统(UMLS)元词表中包含的身体部位、器官和组织的概念,对《人类孟德尔遗传在线》(OMIM)临床概要部分的文本描述进行了索引。我们还使用了与词表中指定身体部位相关的疾病和病症对文本进行索引。词汇量约为81435个概念的177540种表示形式,2161个概念被索引到3779个OMIM条目。索引概念包括134个解剖学概念名词形式的概念,以及分别与132个和408个解剖学概念相关的985个疾病和病症索引概念。我们在此报告,与简单地将器官名称与临床概要文本进行匹配相比,通过索引到临床概要或与索引概念相关联的解剖学概念,可以更全面地检索影响特定器官的疾病的OMIM条目。基于随机抽样,在临床概要中识别相关身体部位的召回率和精确率分别计算为78%和92.5%。对由于缺乏索引的疾病和病症而未识别的身体部位的检查表明,尽管大多数疾病和病症的概念都包含在元词表中,但它们与身体部位的关系却并非如此。索引结果证明了元词表作为识别表示身体部位、疾病和病症的概念的资源的有效性。