Ian Harrow Consulting, UK.
Drug Discov Today. 2013 May;18(9-10):428-34. doi: 10.1016/j.drudis.2012.11.012. Epub 2012 Dec 12.
Research in the life sciences requires ready access to primary data, derived information and relevant knowledge from a multitude of sources. Integration and interoperability of such resources are crucial for sharing content across research domains relevant to the life sciences. In this article we present a perspective review of data integration with emphasis on a semantics driven approach to data integration that pushes content into a shared infrastructure, reduces data redundancy and clarifies any inconsistencies. This enables much improved access to life science data from numerous primary sources. The Semantic Enrichment of the Scientific Literature (SESL) pilot project demonstrates feasibility for using already available open semantic web standards and technologies to integrate public and proprietary data resources, which span structured and unstructured content. This has been accomplished through a precompetitive consortium, which provides a cost effective approach for numerous stakeholders to work together to solve common problems.
生命科学研究需要能够方便地获取来自众多来源的原始数据、衍生信息和相关知识。此类资源的集成和互操作性对于在与生命科学相关的研究领域中共享内容至关重要。在本文中,我们从一个视角对数据集成进行了综述,重点介绍了一种语义驱动的数据集成方法,该方法将内容推送到共享基础架构中,减少了数据冗余并澄清了任何不一致之处。这使得可以从众多原始来源中更方便地访问生命科学数据。科学文献的语义丰富(SESL)试点项目证明了使用现有的开放语义 Web 标准和技术来集成公共和专有数据资源的可行性,这些资源涵盖了结构化和非结构化内容。这是通过一个预竞争联盟实现的,该联盟为众多利益相关者提供了一种具有成本效益的方法,使他们能够共同合作解决共同的问题。