Sarkar I N, Cantor M N, Gelman R, Hartel F, Lussier Y A
Department of Medical Informatics, Columbia University College of Physicians and Surgeons, New York, NY 10032, USA.
Pac Symp Biocomput. 2003:439-50. doi: 10.1142/9789812776303_0041.
Integration of various informatics terminologies will be an essential activity towards supporting the advancement of both the biomedical and clinical sciences. The GO consortium has developed an impressive collection of biomedical terms specific to genes and proteins in a variety of organisms. The UMLS is a composite collection of various medical terminologies, pioneered by the National Library of Medicine. In the present study, we examine a variety of techniques for mapping terms from one terminology (GO) to another (UMLS), and describe their respective performances for a small, curated data set attained from the National Cancer Institute, which had precision values ranging from 30% (100% recall) to 95% (74% recall). Based on each technique's performance, we comment on how each can be used to enrich an existing terminology (UMLS) in future studies and how linking biological terminologies to UMLS differs from linking medical terminologies.
整合各种信息学术语将是支持生物医学和临床科学发展的一项重要活动。基因本体联合会(GO consortium)已经开发出了一系列令人印象深刻的生物医学术语集,这些术语特定于各种生物体中的基因和蛋白质。统一医学语言系统(UMLS)是由美国国立医学图书馆开创的各种医学术语的综合集合。在本研究中,我们研究了将术语从一种术语表(GO)映射到另一种术语表(UMLS)的各种技术,并描述了它们对于从美国国立癌症研究所获得的一个小型精选数据集的各自性能,该数据集的精确值范围为30%(召回率100%)至95%(召回率74%)。基于每种技术的性能,我们评论了每种技术在未来研究中如何用于丰富现有术语表(UMLS),以及将生物学术语与UMLS链接与将医学术语链接有何不同。