Department of Internal Medicine, Translational Informatics Division, University of New Mexico School of Medicine, 1 University of New Mexico, MSC09 5025, Albuquerque, NM 87131, USA.
Database (Oxford). 2013 Jun 21;2013:bat044. doi: 10.1093/database/bat044. Print 2013.
Many bioactivity databases offer information regarding the biological activity of small molecules on protein targets. Information in these databases is often hard to resolve with certainty because of subsetting different data in a variety of formats; use of different bioactivity metrics; use of different identifiers for chemicals and proteins; and having to access different query interfaces, respectively. Given the multitude of data sources, interfaces and standards, it is challenging to gather relevant facts and make appropriate connections and decisions regarding chemical-protein associations. The CARLSBAD database has been developed as an integrated resource, focused on high-quality subsets from several bioactivity databases, which are aggregated and presented in a uniform manner, suitable for the study of the relationships between small molecules and targets. In contrast to data collection resources, CARLSBAD provides a single normalized activity value of a given type for each unique chemical-protein target pair. Two types of scaffold perception methods have been implemented and are available for datamining: HierS (hierarchical scaffolds) and MCES (maximum common edge subgraph). The 2012 release of CARLSBAD contains 439 985 unique chemical structures, mapped onto 1,420 889 unique bioactivities, and annotated with 277 140 HierS scaffolds and 54 135 MCES chemical patterns, respectively. Of the 890 323 unique structure-target pairs curated in CARLSBAD, 13.95% are aggregated from multiple structure-target values: 94 975 are aggregated from two bioactivities, 14 544 from three, 7 930 from four and 2214 have five bioactivities, respectively. CARLSBAD captures bioactivities and tags for 1435 unique chemical structures of active pharmaceutical ingredients (i.e. 'drugs'). CARLSBAD processing resulted in a net 17.3% data reduction for chemicals, 34.3% reduction for bioactivities, 23% reduction for HierS and 25% reduction for MCES, respectively. The CARLSBAD database supports a knowledge mining system that provides non-specialists with novel integrative ways of exploring chemical biology space to facilitate knowledge mining in drug discovery and repurposing. Database URL: http://carlsbad.health.unm.edu/carlsbad/.
许多生物活性数据库提供有关小分子对蛋白质靶标的生物活性的信息。这些数据库中的信息由于在各种格式中细分不同的数据、使用不同的生物活性指标、对化学品和蛋白质使用不同的标识符以及分别访问不同的查询界面,因此很难确定。鉴于众多的数据源、接口和标准,收集与化学-蛋白质关联相关的事实并做出适当的关联和决策具有挑战性。CARLSBAD 数据库是作为一个集成资源开发的,重点是来自几个生物活性数据库的高质量子集,这些子集以统一的方式聚合和呈现,适合于研究小分子和靶标之间的关系。与数据收集资源不同,CARLSBAD 为每个独特的化学-蛋白质靶标对提供给定类型的单个归一化活性值。已经实现了两种类型的支架感知方法,可用于数据挖掘:HierS(层次支架)和 MCES(最大公共边缘子图)。CARLSBAD 的 2012 版本包含 439985 个独特的化学结构,映射到 1420889 个独特的生物活性上,分别注释有 277140 个 HierS 支架和 54135 个 MCES 化学模式。在 CARLSBAD 中 curated 的 890323 个独特的结构-靶标对中,有 13.95%是从多个结构-靶标值聚合而来的:94975 个是从两种生物活性聚合而来的,14544 个是从三种生物活性聚合而来的,7930 个是从四种生物活性聚合而来的,2214 个有五种生物活性。CARLSBAD 捕获了 1435 种活性药物成分(即“药物”)的独特化学结构的生物活性和标签。CARLSBAD 处理导致化学品数据减少 17.3%,生物活性减少 34.3%,HierS 减少 23%,MCES 减少 25%。CARLSBAD 数据库支持知识挖掘系统,为非专业人士提供了探索化学生物学空间的新的综合方法,有助于促进药物发现和再利用中的知识挖掘。数据库网址:http://carlsbad.health.unm.edu/carlsbad/。