DWARF——一个用于分析蛋白质家族的数据仓库系统。

DWARF--a data warehouse system for analyzing protein families.

作者信息

Fischer Markus, Thai Quan K, Grieb Melanie, Pleiss Jürgen

机构信息

Institute of Technical Biochemistry, University of Stuttgart, Allmandring 31, D-70569, Germany.

出版信息

BMC Bioinformatics. 2006 Nov 9;7:495. doi: 10.1186/1471-2105-7-495.

DOI:10.1186/1471-2105-7-495

PMID:17094801

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1647292/

Abstract

BACKGROUND

The emerging field of integrative bioinformatics provides the tools to organize and systematically analyze vast amounts of highly diverse biological data and thus allows to gain a novel understanding of complex biological systems. The data warehouse DWARF applies integrative bioinformatics approaches to the analysis of large protein families.

DESCRIPTION

The data warehouse system DWARF integrates data on sequence, structure, and functional annotation for protein fold families. The underlying relational data model consists of three major sections representing entities related to the protein (biochemical function, source organism, classification to homologous families and superfamilies), the protein sequence (position-specific annotation, mutant information), and the protein structure (secondary structure information, superimposed tertiary structure). Tools for extracting, transforming and loading data from public available resources (ExPDB, GenBank, DSSP) are provided to populate the database. The data can be accessed by an interface for searching and browsing, and by analysis tools that operate on annotation, sequence, or structure. We applied DWARF to the family of alpha/beta-hydrolases to host the Lipase Engineering database. Release 2.3 contains 6138 sequences and 167 experimentally determined protein structures, which are assigned to 37 superfamilies 103 homologous families.

CONCLUSION

DWARF has been designed for constructing databases of large structurally related protein families and for evaluating their sequence-structure-function relationships by a systematic analysis of sequence, structure and functional annotation. It has been applied to predict biochemical properties from sequence, and serves as a valuable tool for protein engineering.

摘要

背景

整合生物信息学这一新兴领域提供了工具，用于组织和系统分析大量高度多样化的生物数据，从而有助于对复杂生物系统有全新的理解。数据仓库DWARF将整合生物信息学方法应用于大型蛋白质家族的分析。

描述

数据仓库系统DWARF整合了蛋白质折叠家族的序列、结构和功能注释数据。底层关系数据模型由三个主要部分组成，分别代表与蛋白质相关的实体（生化功能、来源生物体、同源家族和超家族分类）、蛋白质序列（位置特异性注释、突变信息）以及蛋白质结构（二级结构信息、叠加的三级结构）。提供了从公共可用资源（ExPDB、GenBank、DSSP）提取、转换和加载数据的工具来填充数据库。数据可通过搜索和浏览界面以及基于注释、序列或结构运行的分析工具进行访问。我们将DWARF应用于α/β水解酶家族以承载脂肪酶工程数据库。版本2.3包含6138个序列和167个实验确定的蛋白质结构，这些被分配到37个超家族和103个同源家族。

结论

DWARF旨在构建大型结构相关蛋白质家族的数据库，并通过对序列、结构和功能注释的系统分析来评估它们的序列-结构-功能关系。它已被用于从序列预测生化特性，并作为蛋白质工程的宝贵工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cabb/1647292/59e7eba86a77/1471-2105-7-495-1.jpg

相似文献

DWARF--a data warehouse system for analyzing protein families.DWARF——一个用于分析蛋白质家族的数据仓库系统。

BMC Bioinformatics. 2006 Nov 9;7:495. doi: 10.1186/1471-2105-7-495.

The InterPro database and tools for protein domain analysis.用于蛋白质结构域分析的InterPro数据库及工具。

Curr Protoc Bioinformatics. 2008 Mar;Chapter 2:Unit 2.7. doi: 10.1002/0471250953.bi0207s21.

MannDB - a microbial database of automated protein sequence analyses and evidence integration for protein characterization.MannDB - 一个用于蛋白质表征的自动蛋白质序列分析和证据整合的微生物数据库。

BMC Bioinformatics. 2006 Oct 17;7:459. doi: 10.1186/1471-2105-7-459.

ProtBuD: a database of biological unit structures of protein families and superfamilies.ProtBuD：蛋白质家族和超家族的生物单元结构数据库。

Bioinformatics. 2006 Dec 1;22(23):2876-82. doi: 10.1093/bioinformatics/btl490. Epub 2006 Oct 2.

Adding some SPICE to DAS.给数据采集系统增添一些特色。

Bioinformatics. 2005 Sep 1;21 Suppl 2(Suppl 2):ii40-1. doi: 10.1093/bioinformatics/bti1106.

The Cytochrome P450 Engineering Database: a navigation and prediction tool for the cytochrome P450 protein family.细胞色素P450工程数据库：细胞色素P450蛋白家族的导航与预测工具。

Bioinformatics. 2007 Aug 1;23(15):2015-7. doi: 10.1093/bioinformatics/btm268. Epub 2007 May 17.

WILMA-automated annotation of protein sequences.WILMA - 蛋白质序列的自动注释

Bioinformatics. 2004 Jan 1;20(1):127-8. doi: 10.1093/bioinformatics/btg380.

Intersect: identification and visualization of overlaps in database search results.Intersect：数据库搜索结果中重叠部分的识别与可视化。

Bioinformatics. 2003 Oct 12;19(15):1997-9. doi: 10.1093/bioinformatics/btg256.

Using the tools and resources of the RCSB protein data bank.使用RCSB蛋白质数据库的工具和资源。

Curr Protoc Bioinformatics. 2007 Dec;Chapter 1:1.9.1-1.9.24. doi: 10.1002/0471250953.bi0109s20.

ProtRepeatsDB: a database of amino acid repeats in genomes.ProtRepeatsDB：基因组中氨基酸重复序列数据库。

BMC Bioinformatics. 2006 Jul 7;7:336. doi: 10.1186/1471-2105-7-336.

引用本文的文献

In silico designed novel multi-epitope mRNA vaccines against Brucella by targeting extracellular protein BtuB and LptD.针对胞外蛋白 BtuB 和 LptD 设计新型布鲁氏菌多表位 mRNA 疫苗的计算机设计。

Sci Rep. 2024 Mar 27;14(1):7278. doi: 10.1038/s41598-024-57793-6.

ROCker Models for Reliable Detection and Typing of Short-Read Sequences Carrying β-Lactamase Genes.ROCker 模型用于可靠检测和分型携带β-内酰胺酶基因的短读序列。

mSystems. 2022 Jun 28;7(3):e0128121. doi: 10.1128/msystems.01281-21. Epub 2022 May 31.

Characterization of a novel sn1,3 lipase from Ricinus communis L. suitable for production of oleic acid-palmitic acid-glycerol oleate.从蓖麻（Ricinus communis L.）中鉴定一种新型 sn1,3 脂肪酶，适合生产油酸-棕榈酸-甘油油酸酯。

Sci Rep. 2021 Mar 25;11(1):6913. doi: 10.1038/s41598-021-86305-z.

An online analytical processing multi-dimensional data warehouse for malaria data.用于疟疾数据的在线分析处理多维数据仓库。

Database (Oxford). 2017 Jan 1;2017. doi: 10.1093/database/bax073.

PATRIC as a unique resource for studying antimicrobial resistance.PATRIC 作为研究抗菌药物耐药性的独特资源。

Brief Bioinform. 2019 Jul 19;20(4):1094-1102. doi: 10.1093/bib/bbx083.

Genomic and functional techniques to mine the microbiome for novel antimicrobials and antimicrobial resistance genes.利用基因组学和功能技术挖掘微生物组中的新型抗菌药物和抗菌耐药基因。

Ann N Y Acad Sci. 2017 Jan;1388(1):42-58. doi: 10.1111/nyas.13257. Epub 2016 Oct 21.

Mannitol Stress Directs Flavonoid Metabolism toward Synthesis of Flavones via Differential Regulation of Two Cytochrome P450 Monooxygenases in Coleus forskohlii.甘露醇胁迫通过对毛喉鞘蕊花中两种细胞色素P450单加氧酶的差异调控，引导类黄酮代谢朝着黄酮的合成方向进行。

Front Plant Sci. 2016 Jul 6;7:985. doi: 10.3389/fpls.2016.00985. eCollection 2016.

Extending the NIF DISCO framework to automate complex workflow: coordinating the harvest and integration of data from diverse neuroscience information resources.将 NIF DISCO 框架扩展到自动化复杂工作流程：协调来自不同神经科学信息资源的数据的采集和整合。

Front Neuroinform. 2014 May 28;8:58. doi: 10.3389/fninf.2014.00058. eCollection 2014.

Computational tools for rational protein engineering of aldolases.用于醛缩酶合理蛋白质工程的计算工具。

Comput Struct Biotechnol J. 2012 Nov 13;2:e201209016. doi: 10.5936/csbj.201209016. eCollection 2012.

Lipase improvement: goals and strategies.脂肪酶改善：目标与策略。

Comput Struct Biotechnol J. 2012 Oct 15;2:e201209005. doi: 10.5936/csbj.201209005. eCollection 2012.

本文引用的文献

Atlas - a data warehouse for integrative bioinformatics.阿特拉斯——一个用于整合生物信息学的数据仓库。

BMC Bioinformatics. 2005 Feb 21;6:34. doi: 10.1186/1471-2105-6-34.

Molecular modeling of family GH16 glycoside hydrolases: potential roles for xyloglucan transglucosylases/hydrolases in cell wall modification in the poaceae.GH16家族糖苷水解酶的分子建模：木葡聚糖转葡糖基酶/水解酶在禾本科细胞壁修饰中的潜在作用

Protein Sci. 2004 Dec;13(12):3200-13. doi: 10.1110/ps.04828404.

Protein homology detection by HMM-HMM comparison.通过隐马尔可夫模型（HMM）比较进行蛋白质同源性检测。

Bioinformatics. 2005 Apr 1;21(7):951-60. doi: 10.1093/bioinformatics/bti125. Epub 2004 Nov 5.

Sequence and structure of epoxide hydrolases: a systematic analysis.环氧水解酶的序列与结构：系统分析

Proteins. 2004 Jun 1;55(4):846-55. doi: 10.1002/prot.20013.

SCOP database in 2004: refinements integrate structure and sequence family data.2004年的SCOP数据库：改进整合了结构和序列家族数据。

Nucleic Acids Res. 2004 Jan 1;32(Database issue):D226-9. doi: 10.1093/nar/gkh039.

ESTHER, the database of the alpha/beta-hydrolase fold superfamily of proteins.ESTHER，α/β水解酶折叠超家族蛋白质数据库。

Nucleic Acids Res. 2004 Jan 1;32(Database issue):D145-7. doi: 10.1093/nar/gkh141.

Database resources of the National Center for Biotechnology Information: update.美国国立生物技术信息中心的数据库资源：更新

Nucleic Acids Res. 2004 Jan 1;32(Database issue):D35-40. doi: 10.1093/nar/gkh073.

GenBank: update.基因库：更新。

Nucleic Acids Res. 2004 Jan 1;32(Database issue):D23-6. doi: 10.1093/nar/gkh045.

The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003.2003年的SWISS-PROT蛋白质知识库及其补充TrEMBL。

Nucleic Acids Res. 2003 Jan 1;31(1):365-70. doi: 10.1093/nar/gkg095.

The Protein Information Resource.蛋白质信息资源

Nucleic Acids Res. 2003 Jan 1;31(1):345-7. doi: 10.1093/nar/gkg040.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

DWARF——一个用于分析蛋白质家族的数据仓库系统。

DWARF--a data warehouse system for analyzing protein families.

作者信息

机构信息

出版信息

BACKGROUND

DESCRIPTION

CONCLUSION

背景

描述

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献