Anguita Alberto, Martín Luis, Crespo José, Tsiknakis Manolis
Biomedical Informatics Group, Artificial Intelligence Laboratory, School of Computer Science, Universidad Politécnica de Madrid, Campus de Montegancedo S/N, 28660 Boadilla del Monte, Madrid, Spain.
Stud Health Technol Inform. 2008;136:3-8.
The increasing amount of information available for biomedical research has led to issues related to knowledge discovery in large collections of data. Moreover, Information Retrieval techniques must consider heterogeneities present in databases, initially belonging to different domains-e.g. clinical and genetic data. One of the goals, among others, of the ACGT European is to provide seamless and homogeneous access to integrated databases. In this work, we describe an approach to overcome heterogeneities in identifiers inside queries. We present an ontology classifying the most common identifier semantic heterogeneities, and a service that makes use of it to cope with the problem using the described approach. Finally, we illustrate the solution by analysing a set of real queries.
可用于生物医学研究的信息量不断增加,引发了与大型数据集中知识发现相关的问题。此外,信息检索技术必须考虑数据库中存在的异质性,这些数据库最初属于不同领域,例如临床和遗传数据。ACGT欧洲项目的目标之一( among others 可译为“其中之一”)是提供对集成数据库的无缝且统一的访问。在这项工作中,我们描述了一种克服查询中标识符异质性的方法。我们提出了一个对最常见标识符语义异质性进行分类的本体,以及一个利用该本体通过所描述的方法来解决该问题的服务。最后,我们通过分析一组实际查询来说明该解决方案。