The Kno.e.sis Center, Department of Computer Science and Engineering, Wright State University, Dayton, Ohio, United States of America.
PLoS Negl Trop Dis. 2012 Jan;6(1):e1458. doi: 10.1371/journal.pntd.0001458. Epub 2012 Jan 17.
Research on the biology of parasites requires a sophisticated and integrated computational platform to query and analyze large volumes of data, representing both unpublished (internal) and public (external) data sources. Effective analysis of an integrated data resource using knowledge discovery tools would significantly aid biologists in conducting their research, for example, through identifying various intervention targets in parasites and in deciding the future direction of ongoing as well as planned projects. A key challenge in achieving this objective is the heterogeneity between the internal lab data, usually stored as flat files, Excel spreadsheets or custom-built databases, and the external databases. Reconciling the different forms of heterogeneity and effectively integrating data from disparate sources is a nontrivial task for biologists and requires a dedicated informatics infrastructure. Thus, we developed an integrated environment using Semantic Web technologies that may provide biologists the tools for managing and analyzing their data, without the need for acquiring in-depth computer science knowledge.
METHODOLOGY/PRINCIPAL FINDINGS: We developed a semantic problem-solving environment (SPSE) that uses ontologies to integrate internal lab data with external resources in a Parasite Knowledge Base (PKB), which has the ability to query across these resources in a unified manner. The SPSE includes Web Ontology Language (OWL)-based ontologies, experimental data with its provenance information represented using the Resource Description Format (RDF), and a visual querying tool, Cuebee, that features integrated use of Web services. We demonstrate the use and benefit of SPSE using example queries for identifying gene knockout targets of Trypanosoma cruzi for vaccine development. Answers to these queries involve looking up multiple sources of data, linking them together and presenting the results.
CONCLUSION/SIGNIFICANCE: The SPSE facilitates parasitologists in leveraging the growing, but disparate, parasite data resources by offering an integrative platform that utilizes Semantic Web techniques, while keeping their workload increase minimal.
寄生虫生物学的研究需要一个复杂而综合的计算平台,以便查询和分析大量数据,这些数据代表了未发表的(内部)和公共的(外部)数据源。通过使用知识发现工具对集成数据资源进行有效分析,可以极大地帮助生物学家进行研究,例如,通过识别寄生虫中的各种干预靶点,并决定正在进行和计划中的项目的未来方向。实现这一目标的一个关键挑战是内部实验室数据(通常存储为平面文件、Excel 电子表格或定制数据库)与外部数据库之间的异构性。协调不同形式的异构性,并有效地整合来自不同来源的数据,对于生物学家来说是一项艰巨的任务,需要专门的信息学基础设施。因此,我们使用语义 Web 技术开发了一个集成环境,该环境可为生物学家提供管理和分析数据的工具,而无需深入掌握计算机科学知识。
方法/主要发现:我们开发了一个语义问题解决环境 (SPSE),该环境使用本体论将内部实验室数据与寄生虫知识库 (PKB) 中的外部资源集成在一起,该知识库具有以统一方式查询这些资源的能力。SPSE 包括基于 Web 本体语言 (OWL) 的本体论、使用资源描述格式 (RDF) 表示其出处信息的实验数据以及可视化查询工具 Cuebee,该工具具有集成 Web 服务的功能。我们使用针对寻找用于疫苗开发的克氏锥虫基因敲除靶点的示例查询来演示 SPSE 的使用和好处。这些查询的答案涉及查找多个数据源、将它们链接在一起并呈现结果。
结论/意义:SPSE 通过提供利用语义 Web 技术的综合平台,同时将工作量的增加降到最低,使寄生虫学家能够利用不断增长但分散的寄生虫数据资源。