Aoki-Kinoshita Kiyoko F, Kinjo Akira R, Morita Mizuki, Igarashi Yoshinobu, Chen Yi-An, Shigemoto Yasumasa, Fujisawa Takatomo, Akune Yukie, Katoda Takeo, Kokubu Anna, Mori Takaaki, Nakao Mitsuteru, Kawashima Shuichi, Okamoto Shinobu, Katayama Toshiaki, Ogishima Soichi
Department of Bioinformatics, Faculty of Engineering, Soka University, 1-236 Tangi-machi, Hachioji, Tokyo, 192-8577 Japan.
Laboratory of Protein Informatics, Laboratory of Protein Databases, and Protein Data Bank Japan, Research Center for Structural and Functional Proteomics, Institute for Protein Research, Osaka University, 3-2 Yamadaoka, Suita, Osaka, 565-0871 Japan.
J Biomed Semantics. 2015 Jan 7;6:3. doi: 10.1186/2041-1480-6-3. eCollection 2015.
Linked Data has gained some attention recently in the life sciences as an effective way to provide and share data. As a part of the Semantic Web, data are linked so that a person or machine can explore the web of data. Resource Description Framework (RDF) is the standard means of implementing Linked Data. In the process of generating RDF data, not only are data simply linked to one another, the links themselves are characterized by ontologies, thereby allowing the types of links to be distinguished. Although there is a high labor cost to define an ontology for data providers, the merit lies in the higher level of interoperability with data analysis and visualization software. This increase in interoperability facilitates the multi-faceted retrieval of data, and the appropriate data can be quickly extracted and visualized. Such retrieval is usually performed using the SPARQL (SPARQL Protocol and RDF Query Language) query language, which is used to query RDF data stores. For the database provider, such interoperability will surely lead to an increase in the number of users.
This manuscript describes the experiences and discussions shared among participants of the week-long BioHackathon 2011 who went through the development of RDF representations of their own data and developed specific RDF and SPARQL use cases. Advice regarding considerations to take when developing RDF representations of their data are provided for bioinformaticians considering making data available and interoperable.
Participants of the BioHackathon 2011 were able to produce RDF representations of their data and gain a better understanding of the requirements for producing such data in a period of just five days. We summarize the work accomplished with the hope that it will be useful for researchers involved in developing laboratory databases or data analysis, and those who are considering such technologies as RDF and Linked Data.
关联数据作为一种提供和共享数据的有效方式,最近在生命科学领域受到了一些关注。作为语义网的一部分,数据被链接起来,以便人员或机器能够探索数据网络。资源描述框架(RDF)是实现关联数据的标准方式。在生成RDF数据的过程中,数据不仅简单地相互链接,链接本身还由本体进行表征,从而能够区分链接的类型。尽管为数据提供者定义本体的人力成本很高,但其优点在于与数据分析和可视化软件具有更高的互操作性。这种互操作性的提高有助于多方面的数据检索,并且能够快速提取和可视化适当的数据。这种检索通常使用SPARQL(SPARQL协议和RDF查询语言)查询语言来查询RDF数据存储。对于数据库提供者而言,这种互操作性肯定会带来用户数量的增加。
本文描述了参加为期一周的2011年生物黑客马拉松的参与者们分享的经验和讨论,他们经历了将自己的数据开发为RDF表示形式,并开发了特定的RDF和SPARQL用例。对于考虑使数据可用且具有互操作性的生物信息学家,提供了有关在开发其数据的RDF表示形式时应考虑的事项的建议。
2011年生物黑客马拉松的参与者能够在短短五天内生成他们数据的RDF表示形式,并更好地理解生成此类数据的要求。我们总结了所完成的工作,希望它对参与开发实验室数据库或数据分析的研究人员以及那些正在考虑使用RDF和关联数据等技术的人员有用。