Karp P D
Artificial Intelligence Center, SRI International, Menlo Park, CA 94025.
Comput Appl Biosci. 1992 Aug;8(4):347-57. doi: 10.1093/bioinformatics/8.4.347.
This paper describes a publicly available knowledge base of the chemical compounds involved in intermediary metabolism. We consider the motivations for constructing a knowledge base of metabolic compounds, the methodology by which it was constructed, and the information that it currently contains. Currently the knowledge base describes 981 compounds, listing for each: synonyms for its name, a systematic name, CAS registry number, chemical formula, molecular weight, chemical structure and two-dimensional display coordinates for the structure. The Compound Knowledge Base (CompoundKB) illustrates several methodological principles that should guide the development of biological knowledge bases. I argue that biological datasets should be made available in multiple representations to increase their accessibility to end users, and I present multiple representations of the CompoundKB (knowledge base, relational data base and ASN. 1 representations). I also analyze the general characteristics of these representations to provide an understanding of their relative advantages and disadvantages. Another principle is that the error rate of biological data bases should be estimated and documented-this analysis is performed for the CompoundKB.
本文描述了一个关于参与中间代谢的化合物的公开可用知识库。我们考虑了构建代谢化合物知识库的动机、构建方法以及它目前所包含的信息。目前,该知识库描述了981种化合物,每种化合物列出:名称同义词、系统名称、CAS登记号、化学式、分子量、化学结构以及结构的二维显示坐标。化合物知识库(CompoundKB)阐述了若干应指导生物知识库开发的方法学原则。我认为生物数据集应以多种表示形式提供,以增加终端用户对其的可访问性,并且我展示了CompoundKB的多种表示形式(知识库、关系数据库和ASN.1表示形式)。我还分析了这些表示形式的一般特征,以帮助理解它们的相对优缺点。另一个原则是应估计并记录生物数据库的错误率——本文对CompoundKB进行了此项分析。