Department of Computer Science, University of Miami, Coral Gables, FL, USA.
BMC Bioinformatics. 2011 Jun 24;12:257. doi: 10.1186/1471-2105-12-257.
High-throughput screening (HTS) is one of the main strategies to identify novel entry points for the development of small molecule chemical probes and drugs and is now commonly accessible to public sector research. Large amounts of data generated in HTS campaigns are submitted to public repositories such as PubChem, which is growing at an exponential rate. The diversity and quantity of available HTS assays and screening results pose enormous challenges to organizing, standardizing, integrating, and analyzing the datasets and thus to maximize the scientific and ultimately the public health impact of the huge investments made to implement public sector HTS capabilities. Novel approaches to organize, standardize and access HTS data are required to address these challenges.
We developed the first ontology to describe HTS experiments and screening results using expressive description logic. The BioAssay Ontology (BAO) serves as a foundation for the standardization of HTS assays and data and as a semantic knowledge model. In this paper we show important examples of formalizing HTS domain knowledge and we point out the advantages of this approach. The ontology is available online at the NCBO bioportal http://bioportal.bioontology.org/ontologies/44531.
After a large manual curation effort, we loaded BAO-mapped data triples into a RDF database store and used a reasoner in several case studies to demonstrate the benefits of formalized domain knowledge representation in BAO. The examples illustrate semantic querying capabilities where BAO enables the retrieval of inferred search results that are relevant to a given query, but are not explicitly defined. BAO thus opens new functionality for annotating, querying, and analyzing HTS datasets and the potential for discovering new knowledge by means of inference.
高通量筛选(HTS)是开发小分子化学探针和药物的新切入点的主要策略之一,现在公共部门研究也普遍可以使用。HTS 活动中生成的大量数据被提交到 PubChem 等公共存储库,这些存储库呈指数级增长。可用 HTS 检测和筛选结果的多样性和数量对组织、标准化、集成和分析数据集构成了巨大挑战,从而最大限度地提高了为实施公共部门 HTS 能力而进行的巨额投资的科学和最终公共卫生影响。需要新的方法来组织、标准化和访问 HTS 数据,以应对这些挑战。
我们使用表达描述逻辑开发了第一个描述 HTS 实验和筛选结果的本体。生物测定本体(BAO)用作 HTS 检测和数据的标准化基础和语义知识模型。在本文中,我们展示了形式化 HTS 领域知识的重要示例,并指出了这种方法的优势。该本体可在 NCBO bioportal 上获得,网址为 http://bioportal.bioontology.org/ontologies/44531。
经过大量的手动编纂工作,我们将 BAO 映射的数据三元组加载到 RDF 数据库存储中,并在几个案例研究中使用推理机来演示 BAO 中形式化领域知识表示的好处。这些示例说明了语义查询功能,其中 BAO 使能够检索与给定查询相关但未明确定义的推断搜索结果。因此,BAO 为注释、查询和分析 HTS 数据集以及通过推理发现新知识提供了新的功能。