Mougin Fleur, Burgun Anita, Bodenreider Olivier
EA 3888, IFR 140, Faculté de Médecine, Université de Rennes I, France.
BMC Bioinformatics. 2006 Nov 24;7 Suppl 3(Suppl 3):S6. doi: 10.1186/1471-2105-7-S3-S6.
Data integration is a crucial task in the biomedical domain and integrating data sources is one approach to integrating data. Data elements (DEs) in particular play an important role in data integration. We combine schema- and instance-based approaches to mapping DEs to terminological resources in order to facilitate data sources integration.
We extracted DEs from eleven disparate biomedical sources. We compared these DEs to concepts and/or terms in biomedical controlled vocabularies and to reference DEs. We also exploited DE values to disambiguate underspecified DEs and to identify additional mappings.
82.5% of the 474 DEs studied are mapped to entries of a terminological resource and 74.7% of the whole set can be associated with reference DEs. Only 6.6% of the DEs had values that could be semantically typed.
Our study suggests that the integration of biomedical sources can be achieved automatically with limited precision and largely facilitated by mapping DEs to terminological resources.
数据整合是生物医学领域的一项关键任务,整合数据源是数据整合的一种方法。数据元素(DEs)在数据整合中尤其发挥着重要作用。我们结合基于模式和实例的方法将数据元素映射到术语资源,以促进数据源整合。
我们从11个不同的生物医学源中提取数据元素。我们将这些数据元素与生物医学控制词汇表中的概念和/或术语以及参考数据元素进行比较。我们还利用数据元素值来消除未明确指定的数据元素的歧义并识别其他映射。
所研究的474个数据元素中有82.5%被映射到术语资源的条目,并且整个集合的74.7%可以与参考数据元素相关联。只有6.6%的数据元素具有可进行语义类型化的值。
我们的研究表明,生物医学源的整合可以在有限的精度下自动实现,并且通过将数据元素映射到术语资源可在很大程度上得到促进。