López-García Pablo, Lependu Paea, Musen Mark, Illarramendi Arantza
Stanford Center for Biomedical Informatics Research, Stanford University, Medical School Office Building, Room X-215, 1265 Welch Road, Stanford, CA 94305-5479, USA; Department of Computer Languages and Systems, University of the Basque Country UPV/EHU, Manuel de Lardizabal 1, 20018 Donostia-San Sebastián, Spain.
Stanford Center for Biomedical Informatics Research, Stanford University, Medical School Office Building, Room X-215, 1265 Welch Road, Stanford, CA 94305-5479, USA.
J Biomed Inform. 2014 Feb;47:105-11. doi: 10.1016/j.jbi.2013.09.011. Epub 2013 Oct 1.
The benefits of using ontology subsets versus full ontologies are well-documented for many applications. In this study, we propose an efficient subset extraction approach for a domain using a biomedical ontology repository with mappings, a cross-ontology, and a source subset from a related domain. As a case study, we extracted a subset of drugs from RxNorm using the UMLS Metathesaurus, the NDF-RT cross-ontology, and the CORE problem list subset of SNOMED CT. The extracted subset, which we termed RxNorm/CORE, was 4% the size of the full RxNorm (0.4% when considering ingredients only). For evaluation, we used CORE and RxNorm/CORE as thesauri for the annotation of clinical documents and compared their performance to that of their respective full ontologies (i.e., SNOMED CT and RxNorm). The wide range in recall of both CORE (29-69%) and RxNorm/CORE (21-35%) suggests that more quantitative research is needed to assess the benefits of using ontology subsets as thesauri in annotation applications. Our approach to subset extraction, however, opens a door to help create other types of clinically useful domain specific subsets and acts as an alternative in scenarios where well-established subset extraction techniques might suffer from difficulties or cannot be applied.
在许多应用中,使用本体子集而非完整本体的好处已有充分记录。在本研究中,我们提出了一种针对某领域的高效子集提取方法,该方法使用了一个带有映射的生物医学本体库、一个跨本体以及来自相关领域的源子集。作为案例研究,我们使用UMLS元词表、NDF - RT跨本体以及SNOMED CT的CORE问题列表子集从RxNorm中提取了一个药物子集。我们将提取的子集称为RxNorm/CORE,其大小为完整RxNorm的4%(仅考虑成分时为0.4%)。为了进行评估,我们使用CORE和RxNorm/CORE作为叙词表来标注临床文档,并将它们的性能与各自的完整本体(即SNOMED CT和RxNorm)进行比较。CORE(29 - 69%)和RxNorm/CORE(21 - 35%)召回率的广泛差异表明,需要更多的定量研究来评估在标注应用中使用本体子集作为叙词表的好处。然而,我们的子集提取方法为帮助创建其他类型的临床有用领域特定子集打开了一扇门,并在成熟的子集提取技术可能遇到困难或无法应用的情况下提供了一种替代方案。