Wimalaratne Sarala M, Bolleman Jerven, Juty Nick, Katayama Toshiaki, Dumontier Michel, Redaschi Nicole, Le Novère Nicolas, Hermjakob Henning, Laibe Camille
European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK, Swiss-Prot group, Swiss Institute of Bioinformatics, Centre Medical Universitaire, 1211 Geneve, Switzerland, Database Center for Life Science (DCLS), Research Organization of Information and Systems, 178-4-4 Wakashiba, Kashiwa, Chiba 277-0871, Japan, Stanford Center for Biomedical Informatics Research, Stanford University, CA 94305-5479, USA and Babraham Institute, Babraham Research Campus, Cambridge, CB22 3AT, UK.
European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK, Swiss-Prot group, Swiss Institute of Bioinformatics, Centre Medical Universitaire, 1211 Geneve, Switzerland, Database Center for Life Science (DCLS), Research Organization of Information and Systems, 178-4-4 Wakashiba, Kashiwa, Chiba 277-0871, Japan, Stanford Center for Biomedical Informatics Research, Stanford University, CA 94305-5479, USA and Babraham Institute, Babraham Research Campus, Cambridge, CB22 3AT, UK European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK, Swiss-Prot group, Swiss Institute of Bioinformatics, Centre Medical Universitaire, 1211 Geneve, Switzerland, Database Center for Life Science (DCLS), Research Organization of Information and Systems, 178-4-4 Wakashiba, Kashiwa, Chiba 277-0871, Japan, Stanford Center for Biomedical Informatics Research, Stanford University, CA 94305-5479, USA and Babraham Institute, Babraham Research Campus, Cambridge, CB22 3AT, UK.
Bioinformatics. 2015 Jun 1;31(11):1875-7. doi: 10.1093/bioinformatics/btv064. Epub 2015 Jan 31.
On the semantic web, in life sciences in particular, data is often distributed via multiple resources. Each of these sources is likely to use their own International Resource Identifier for conceptually the same resource or database record. The lack of correspondence between identifiers introduces a barrier when executing federated SPARQL queries across life science data.
We introduce a novel SPARQL-based service to enable on-the-fly integration of life science data. This service uses the identifier patterns defined in the Identifiers.org Registry to generate a plurality of identifier variants, which can then be used to match source identifiers with target identifiers. We demonstrate the utility of this identifier integration approach by answering queries across major producers of life science Linked Data.
The SPARQL-based identifier conversion service is available without restriction at http://identifiers.org/services/sparql.
在语义网中,特别是在生命科学领域,数据通常通过多个资源进行分发。这些来源中的每一个都可能针对概念上相同的资源或数据库记录使用其自己的国际资源标识符。在跨生命科学数据执行联合SPARQL查询时,标识符之间缺乏对应关系会带来障碍。
我们引入了一种基于SPARQL的新颖服务,以实现生命科学数据的即时集成。此服务使用Identifiers.org注册中心中定义的标识符模式来生成多个标识符变体,然后可用于将源标识符与目标标识符进行匹配。我们通过回答跨生命科学链接数据主要生产者的查询来证明这种标识符集成方法的实用性。
基于SPARQL的标识符转换服务可在http://identifiers.org/services/sparql上无限制获取。