基于蛋白质结构构建的数据分区方案，通过联邦数据库中的分布式查询实现蛋白质大分子结构的比对。

Protein Construction-Based Data Partitioning Scheme for Alignment of Protein Macromolecular Structures Through Distributed Querying in Federated Databases.

出版信息

IEEE Trans Nanobioscience. 2020 Jan;19(1):102-116. doi: 10.1109/TNB.2019.2930494. Epub 2019 Jul 22.

DOI:10.1109/TNB.2019.2930494

Abstract

Exploration of various characteristics of 3D protein structures through querying relational databases storing the structures can be challenging due to the necessity to conform to a particular database schema. However, this also brings several advantages, like the ability to perform extensive database searches with declarative SQL language, protect data against hardware damages through regular backup mechanisms, and secure data against unauthorized access. Since relational databases do not provide exploration methods specific for protein data and its biological semantics, like searches on the basis of protein structural patterns, the use of relational databases in this domain is still rare and requires the development of dedicated methods to increase the speed of data exploration techniques. In this paper, we show a novel data partitioning scheme for distributing data across database clusters that can be used for performing sophisticated explorations of 3D protein structures. The data partitioning scheme relies on protein construction, which requires data preprocessing but results in shorter exploration times through querying federated databases. We solve the problem of finding proteins in Oracle relational database on the basis of the similarity of 3D protein structures with the use of distributed PAR-P3D-SQL queries. Since 3D protein structure similarity searching is one of the most time-consuming exploration processes that can be performed for protein data, we make use of a distributed environment of Oracle federated databases, distributed query processing, and dedicated load balancing methods to accelerate the exploration. Results of performed tests confirm that we are able to significantly increase the speed of the exploration process, proportionally to the number of database nodes in the federated environment.

摘要

通过查询存储结构的关系型数据库来探索各种 3D 蛋白质结构的特性可能具有挑战性，因为这需要符合特定的数据库模式。然而，这也带来了一些优势，例如能够使用声明式 SQL 语言进行广泛的数据库搜索、通过定期备份机制保护数据免受硬件损坏以及保护数据免受未经授权的访问。由于关系型数据库没有提供针对蛋白质数据及其生物语义的特定探索方法，例如基于蛋白质结构模式的搜索，因此在该领域中使用关系型数据库仍然很少见，需要开发专门的方法来提高数据探索技术的速度。在本文中，我们展示了一种新颖的数据分区方案，用于在数据库集群之间分配数据，可用于对 3D 蛋白质结构进行复杂的探索。该数据分区方案依赖于蛋白质构建，这需要数据预处理，但通过查询联邦数据库可以缩短探索时间。我们使用分布式 PAR-P3D-SQL 查询基于 3D 蛋白质结构的相似性在 Oracle 关系型数据库中查找蛋白质。由于 3D 蛋白质结构相似性搜索是可以对蛋白质数据执行的最耗时的探索过程之一，因此我们利用 Oracle 联邦数据库的分布式环境、分布式查询处理和专用的负载平衡方法来加速探索。执行测试的结果证实，我们能够显著提高探索过程的速度，与联邦环境中的数据库节点数量成正比。

相似文献

Protein Construction-Based Data Partitioning Scheme for Alignment of Protein Macromolecular Structures Through Distributed Querying in Federated Databases.基于蛋白质结构构建的数据分区方案，通过联邦数据库中的分布式查询实现蛋白质大分子结构的比对。

IEEE Trans Nanobioscience. 2020 Jan;19(1):102-116. doi: 10.1109/TNB.2019.2930494. Epub 2019 Jul 22.

PSS-SQL: protein secondary structure - structured query language.PSS-SQL：蛋白质二级结构 - 结构化查询语言。

Annu Int Conf IEEE Eng Med Biol Soc. 2010;2010:1073-6. doi: 10.1109/IEMBS.2010.5627303.

Using SQL Databases for Sequence Similarity Searching and Analysis.使用SQL数据库进行序列相似性搜索与分析。

Curr Protoc Bioinformatics. 2017 Sep 13;59:9.4.1-9.4.22. doi: 10.1002/cpbi.32.

PSI: indexing protein structures for fast similarity search.PSI：为快速相似性搜索对蛋白质结构进行索引。

Bioinformatics. 2003;19 Suppl 1:i81-3. doi: 10.1093/bioinformatics/btg1009.

PTGL--a web-based database application for protein topologies.PTGL——一个基于网络的蛋白质拓扑结构数据库应用程序。

Bioinformatics. 2004 Nov 22;20(17):3277-9. doi: 10.1093/bioinformatics/bth367. Epub 2004 Jun 24.

Scalable Extraction of Big Macromolecular Data in Azure Data Lake Environment.在 Azure 数据湖环境中可扩展地提取大分子数据。

Molecules. 2019 Jan 5;24(1):179. doi: 10.3390/molecules24010179.

Structural Class Classification of 3D Protein Structure Based on Multi-View 2D Images.基于多视角 2D 图像的 3D 蛋白质结构的结构分类。

IEEE/ACM Trans Comput Biol Bioinform. 2018 Jan-Feb;15(1):286-299. doi: 10.1109/TCBB.2016.2603987. Epub 2016 Aug 29.

Protein structure alignment and fast similarity search using local shape signatures.使用局部形状特征进行蛋白质结构比对和快速相似性搜索。

J Bioinform Comput Biol. 2004 Mar;2(1):215-39. doi: 10.1142/s0219720004000533.

Multiple Alignment of protein structures and sequences for VMD.用于VMD的蛋白质结构和序列的多序列比对。

Bioinformatics. 2006 Feb 15;22(4):504-6. doi: 10.1093/bioinformatics/bti825. Epub 2005 Dec 8.

Evaluating protein similarity from coarse structures.评估粗结构蛋白质的相似性。

IEEE/ACM Trans Comput Biol Bioinform. 2009 Oct-Dec;6(4):583-93. doi: 10.1109/TCBB.2007.70250.

基于蛋白质结构构建的数据分区方案，通过联邦数据库中的分布式查询实现蛋白质大分子结构的比对。

Protein Construction-Based Data Partitioning Scheme for Alignment of Protein Macromolecular Structures Through Distributed Querying in Federated Databases.

出版信息

IEEE Trans Nanobioscience. 2020 Jan;19(1):102-116. doi: 10.1109/TNB.2019.2930494. Epub 2019 Jul 22.

DOI:10.1109/TNB.2019.2930494

PMID:31329125

Abstract

摘要

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于蛋白质结构构建的数据分区方案，通过联邦数据库中的分布式查询实现蛋白质大分子结构的比对。

Protein Construction-Based Data Partitioning Scheme for Alignment of Protein Macromolecular Structures Through Distributed Querying in Federated Databases.

出版信息

相似文献

基于蛋白质结构构建的数据分区方案，通过联邦数据库中的分布式查询实现蛋白质大分子结构的比对。

Protein Construction-Based Data Partitioning Scheme for Alignment of Protein Macromolecular Structures Through Distributed Querying in Federated Databases.

出版信息

相似文献