He Zhe, Chen Yan, Geller James
School of Information, Florida State University, Tallahassee, FL, USA.
Department of Computer Information Systems, BMCC, City University of New York, New York, USA.
Stud Health Technol Inform. 2017;245:863-867.
The National Cancer Institute Thesaurus (NCIt), developed and maintained by the National Cancer Institute, is an important reference terminology in the cancer domain. As a controlled terminology needs to continuously incorporate new concepts to enrich its conceptual content, automated and semi-automated methods for identifying potential new concepts are in high demand. We have previously developed a topological-pattern-based method for identifying new concepts in a controlled terminology to enrich another terminology, using the UMLS Metathesaurus. In this work, we utilize this method with the National Cancer Institute Metathesaurus to identify new concepts for NCIt. While previous work was only oriented towards identifying candidate import concepts for human review, we are now also adding an algorithmic method to evaluate candidate concepts and reject a well defined group of them.
由美国国立癌症研究所开发并维护的《美国国立癌症研究所叙词表》(NCIt)是癌症领域的重要参考术语表。由于受控术语表需要不断纳入新概念以丰富其概念内容,因此对识别潜在新概念的自动化和半自动化方法有很高的需求。我们之前开发了一种基于拓扑模式的方法,利用《统一医学语言系统元词表》在一个受控术语表中识别新概念,以丰富另一个术语表。在这项工作中,我们将此方法应用于《美国国立癌症研究所元词表》,为NCIt识别新概念。虽然之前的工作仅旨在识别供人工审核的候选导入概念,但我们现在还添加了一种算法方法来评估候选概念,并剔除其中一组明确界定的概念。