Smalter Hall Aaron, Shan Yunfeng, Lushington Gerald, Visvanathan Mahesh
Bioinformatics Core Facility, University of Kansas, Lawrence, Kansas 66047, USA.
Comb Chem High Throughput Screen. 2013 Mar;16(3):189-98. doi: 10.2174/1386207311316030004.
Databases and exchange formats describing biological entities such as chemicals and proteins, along with their relationships, are a critical component of research in life sciences disciplines, including chemical biology wherein small information about small molecule properties converges with cellular and molecular biology. Databases for storing biological entities are growing not only in size, but also in type, with many similarities between them and often subtle differences. The data formats available to describe and exchange these entities are numerous as well. In general, each format is optimized for a particular purpose or database, and hence some understanding of these formats is required when choosing one for research purposes. This paper reviews a selection of different databases and data formats with the goal of summarizing their purposes, features, and limitations. Databases are reviewed under the categories of 1) protein interactions, 2) metabolic pathways, 3) chemical interactions, and 4) drug discovery. Representation formats will be discussed according to those describing chemical structures, and those describing genomic/proteomic entities.
描述诸如化学物质和蛋白质等生物实体及其相互关系的数据库和交换格式,是生命科学学科研究的关键组成部分,包括化学生物学,其中小分子特性的少量信息与细胞生物学和分子生物学相结合。用于存储生物实体的数据库不仅在规模上不断增长,而且在类型上也不断增加,它们之间有许多相似之处,也常常存在细微差异。可用于描述和交换这些实体的数据格式也多种多样。一般来说,每种格式都针对特定目的或数据库进行了优化,因此在为研究目的选择一种格式时,需要对这些格式有所了解。本文综述了一系列不同的数据库和数据格式,目的是总结它们的用途、特点和局限性。数据库按照以下类别进行综述:1)蛋白质相互作用,2)代谢途径,3)化学相互作用,4)药物发现。表示格式将根据描述化学结构的格式和描述基因组/蛋白质组实体的格式进行讨论。