Oliveira S R M, Almeida G V, Souza K R R, Rodrigues D N, Kuser-Falcão P R, Yamagishi M E B, Santos E H, Vieira F D, Jardine J G, Neshich G
Embrapa Informática Agropecuária, Campinas, SP, Brasil.
Genet Mol Res. 2007 Oct 5;6(4):911-22.
An effective strategy for managing protein databases is to provide mechanisms to transform raw data into consistent, accurate and reliable information. Such mechanisms will greatly reduce operational inefficiencies and improve one's ability to better handle scientific objectives and interpret the research results. To achieve this challenging goal for the STING project, we introduce Sting_RDB, a relational database of structural parameters for protein analysis with support for data warehousing and data mining. In this article, we highlight the main features of Sting_RDB and show how a user can explore it for efficient and biologically relevant queries. Considering its importance for molecular biologists, effort has been made to advance Sting_RDB toward data quality assessment. To the best of our knowledge, Sting_RDB is one of the most comprehensive data repositories for protein analysis, now also capable of providing its users with a data quality indicator. This paper differs from our previous study in many aspects. First, we introduce Sting_RDB, a relational database with mechanisms for efficient and relevant queries using SQL. Sting_rdb evolved from the earlier, text (flat file)-based database, in which data consistency and integrity was not guaranteed. Second, we provide support for data warehousing and mining. Third, the data quality indicator was introduced. Finally and probably most importantly, complex queries that could not be posed on a text-based database, are now easily implemented. Further details are accessible at the Sting_RDB demo web page: http://www.cbi.cnptia.embrapa.br/StingRDB.
管理蛋白质数据库的一个有效策略是提供将原始数据转化为一致、准确和可靠信息的机制。这样的机制将极大地减少操作效率低下的情况,并提高人们更好地处理科学目标和解释研究结果的能力。为了实现STING项目这一具有挑战性的目标,我们引入了Sting_RDB,这是一个用于蛋白质分析的结构参数关系数据库,支持数据仓库和数据挖掘。在本文中,我们突出了Sting_RDB的主要特性,并展示了用户如何对其进行探索以进行高效且与生物学相关的查询。考虑到它对分子生物学家的重要性,我们已努力推动Sting_RDB进行数据质量评估。据我们所知,Sting_RDB是蛋白质分析方面最全面的数据存储库之一,现在还能够为用户提供数据质量指标。本文在许多方面与我们之前的研究不同。首先,我们引入了Sting_RDB,这是一个具有使用SQL进行高效且相关查询机制的关系数据库。Sting_rdb是从早期基于文本(平面文件)的数据库发展而来的,在那个数据库中数据一致性和完整性无法得到保证。其次,我们提供了对数据仓库和挖掘的支持。第三,引入了数据质量指标。最后且可能最重要的是,现在可以轻松实现那些在基于文本的数据库上无法提出的复杂查询。更多详细信息可在Sting_RDB演示网页获取:http://www.cbi.cnptia.embrapa.br/StingRDB 。