优化的关系型数据库，用于查询蛋白质中的结构模式。

An optimized relational database for querying structural patterns in proteins.

机构信息

Department of Computer Science, Faculty of Engineering, Universidad de Talca, Camino a Los Niches Km. 1, Curicó, Región del Maule 3340000, Chile.

Millennium Institute for Foundational Research on Data (IMFD), Vicuña Mackenna 4860, Macul, Santiago, Región Metropolitana 7810000, Chile.

出版信息

Database (Oxford). 2024 Jan 17;2024. doi: 10.1093/database/baad093.

DOI:10.1093/database/baad093

PMID:38236197

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10939390/

Abstract

A database is an essential component in almost any software system, and its creation involves more than just data modeling and schema design. It also includes query optimization and tuning. This paper focuses on a web system called GSP4PDB, which is used for searching structural patterns in proteins. The system utilizes a normalized relational database, which has proven to be inefficient even for simple queries. This article discusses the optimization of the GSP4PDB database by implementing two techniques: denormalization and indexing. The empirical evaluation described in the article shows that combining these techniques enhances the efficiency of the database when querying both real and artificial graph-based structural patterns.

摘要

数据库是几乎任何软件系统的重要组成部分，其创建不仅仅涉及数据建模和模式设计。它还包括查询优化和调优。本文介绍了一个名为 GSP4PDB 的网络系统，该系统用于在蛋白质中搜索结构模式。该系统使用规范化关系数据库，即使对于简单的查询，其效率也不高。本文讨论了通过实现两种技术对 GSP4PDB 数据库进行优化：非规范化和索引。本文中描述的实验评估表明，当查询真实和人工基于图的结构模式时，组合使用这些技术可以提高数据库的效率。