使用优化数据分解的蛋白质数据表示与查询

Protein data representation and query using optimized data decomposition.

作者信息

Shindyalov I N, Bourne P E

机构信息

San Diego Supercomputer Center, CA 92186-9784, USA. shindyal,

出版信息

Comput Appl Biosci. 1997 Oct;13(5):487-96. doi: 10.1093/bioinformatics/13.5.487.

DOI:10.1093/bioinformatics/13.5.487

PMID:9367122

Abstract

MOTIVATION

To provide data management tools to maintain and query efficiently experimental and derived protein data with the goal of providing new insights into structure-function relationships. The tools should be portable, extensible, and accessible locally, or via the World Wide Web, providing data that would not otherwise be available.

RESULTS

The initial phase of the work, the data representation and query of all available macromolecular structure data, including real-time access to complex property patterns based on the amino acid sequence, is reported. protein structure data taken from the Protein Data Bank (PDB) are decomposed into native and derived elementary properties, and represented as compact indexed objects minimizing storage requirements and query time for select types of query. In addition, collections of indices representing a particular property are maintained and can be queried for specific property patterns found across the whole database. The approach is proving applicable to a wide variety of data available on specific protein families.

摘要

动机

提供数据管理工具，以便高效地维护和查询实验性和衍生的蛋白质数据，目的是为结构 - 功能关系提供新的见解。这些工具应具备可移植性、可扩展性，并且可以在本地或通过万维网访问，提供其他方式无法获取的数据。

结果

报告了工作的初始阶段，即对所有可用大分子结构数据的数据表示和查询，包括基于氨基酸序列实时访问复杂的属性模式。从蛋白质数据库（PDB）获取的蛋白质结构数据被分解为原始和衍生的基本属性，并表示为紧凑的索引对象，以最小化特定类型查询的存储需求和查询时间。此外，维护表示特定属性的索引集合，并可针对整个数据库中发现的特定属性模式进行查询。该方法已证明适用于特定蛋白质家族的各种可用数据。