使用语义网技术改善生命科学信息检索
Improving life sciences information retrieval using semantic web technology.
作者信息
Quan Dennis
机构信息
IBM, San Jose, CA 95141, USA.
出版信息
Brief Bioinform. 2007 May;8(3):172-82. doi: 10.1093/bib/bbm016. Epub 2007 May 25.
The ability to retrieve relevant information is at the heart of every aspect of research and development in the life sciences industry. Information is often distributed across multiple systems and recorded in a way that makes it difficult to piece together the complete picture. Differences in data formats, naming schemes and network protocols amongst information sources, both public and private, must be overcome, and user interfaces not only need to be able to tap into these diverse information sources but must also assist users in filtering out extraneous information and highlighting the key relationships hidden within an aggregated set of information. The Semantic Web community has made great strides in proposing solutions to these problems, and many efforts are underway to apply Semantic Web techniques to the problem of information retrieval in the life sciences space. This article gives an overview of the principles underlying a Semantic Web-enabled information retrieval system: creating a unified abstraction for knowledge using the RDF semantic network model; designing semantic lenses that extract contextually relevant subsets of information; and assembling semantic lenses into powerful information displays. Furthermore, concrete examples of how these principles can be applied to life science problems including a scenario involving a drug discovery dashboard prototype called BioDash are provided.
检索相关信息的能力是生命科学行业研发各个方面的核心。信息通常分布在多个系统中,并且其记录方式使得难以拼凑出完整的图景。必须克服公共和私有信息源之间数据格式、命名方案和网络协议的差异,用户界面不仅需要能够接入这些多样的信息源,还必须帮助用户过滤掉无关信息,并突出隐藏在聚合信息集中的关键关系。语义网社区在提出解决这些问题的方案方面取得了巨大进展,并且正在进行许多努力,将语义网技术应用于生命科学领域的信息检索问题。本文概述了支持语义网的信息检索系统所依据的原则:使用RDF语义网络模型为知识创建统一的抽象;设计能够提取上下文相关信息子集的语义透镜;以及将语义透镜组装成强大的信息展示。此外,还提供了这些原则如何应用于生命科学问题的具体示例,包括一个涉及名为BioDash的药物发现仪表板原型的场景。