对基因组数据的数据库性能进行基准测试。

Benchmarking database performance for genomic data.

作者信息

Khushi Matloob

机构信息

Bioinformatics Unit, Children's Medical Research Institute, Westmead, NSW, Australia; Centre for Cancer Research, Westmead Millennium Institute; Sydney Medical School, Westmead, University of Sydney, Sydney, Australia.

出版信息

J Cell Biochem. 2015 Jun;116(6):877-83. doi: 10.1002/jcb.25049.

DOI:10.1002/jcb.25049

PMID:25560631

Abstract

Genomic regions represent features such as gene annotations, transcription factor binding sites and epigenetic modifications. Performing various genomic operations such as identifying overlapping/non-overlapping regions or nearest gene annotations are common research needs. The data can be saved in a database system for easy management, however, there is no comprehensive database built-in algorithm at present to identify overlapping regions. Therefore I have developed a novel region-mapping (RegMap) SQL-based algorithm to perform genomic operations and have benchmarked the performance of different databases. Benchmarking identified that PostgreSQL extracts overlapping regions much faster than MySQL. Insertion and data uploads in PostgreSQL were also better, although general searching capability of both databases was almost equivalent. In addition, using the algorithm pair-wise, overlaps of >1000 datasets of transcription factor binding sites and histone marks, collected from previous publications, were reported and it was found that HNF4G significantly co-locates with cohesin subunit STAG1 (SA1).Inc.

摘要

基因组区域代表着诸如基因注释、转录因子结合位点和表观遗传修饰等特征。执行各种基因组操作，如识别重叠/非重叠区域或最近的基因注释，是常见的研究需求。数据可以保存在数据库系统中以便于管理，然而，目前还没有内置的综合数据库算法来识别重叠区域。因此，我开发了一种基于SQL的新型区域映射（RegMap）算法来执行基因组操作，并对不同数据库的性能进行了基准测试。基准测试表明，PostgreSQL提取重叠区域的速度比MySQL快得多。PostgreSQL中的插入和数据上传也更好，尽管两个数据库的一般搜索能力几乎相当。此外，使用该算法对从以前的出版物中收集的1000多个转录因子结合位点和组蛋白标记数据集进行成对重叠分析，发现HNF4G与黏连蛋白亚基STAG1（SA1）显著共定位。公司

相似文献

Benchmarking database performance for genomic data.

J Cell Biochem. 2015 Jun;116(6):877-83. doi: 10.1002/jcb.25049.

Binding sites analyser (BiSA): software for genomic binding sites archiving and overlap analysis.

PLoS One. 2014 Feb 12;9(2):e87301. doi: 10.1371/journal.pone.0087301. eCollection 2014.

Ab initio identification of putative human transcription factor binding sites by comparative genomics.

BMC Bioinformatics. 2005 May 2;6:110. doi: 10.1186/1471-2105-6-110.

Rapid innovation in ChIP-seq peak-calling algorithms is outdistancing benchmarking efforts.

Brief Bioinform. 2011 Nov;12(6):626-33. doi: 10.1093/bib/bbq068. Epub 2010 Nov 8.

SwissRegulon: a database of genome-wide annotations of regulatory sites.

Nucleic Acids Res. 2007 Jan;35(Database issue):D127-31. doi: 10.1093/nar/gkl857. Epub 2006 Nov 27.

Combining experts in order to identify binding sites in yeast and mouse genomic data.

Neural Netw. 2008 Aug;21(6):856-61. doi: 10.1016/j.neunet.2008.07.004. Epub 2008 Aug 3.

Inference of transcriptional regulation using gene expression data from the bovine and human genomes.

BMC Genomics. 2007 Aug 3;8:265. doi: 10.1186/1471-2164-8-265.

An Efficient Search Algorithm for Finding Genomic-Range Overlaps Based on the Maximum Range Length.

IEEE/ACM Trans Comput Biol Bioinform. 2015 Jul-Aug;12(4):778-84. doi: 10.1109/TCBB.2014.2369042.

An HMM approach to genome-wide identification of differential histone modification sites from ChIP-seq data.

Bioinformatics. 2008 Oct 15;24(20):2344-9. doi: 10.1093/bioinformatics/btn402. Epub 2008 Jul 29.

A linear time algorithm for detecting long genomic regions enriched with a specific combination of epigenetic states.

BMC Genomics. 2015;16 Suppl 2(Suppl 2):S8. doi: 10.1186/1471-2164-16-S2-S8. Epub 2015 Jan 21.

引用本文的文献

Evaluation of Functional Abilities in 0-6 Year Olds: an Analysis with the eEarlyCare Computer Application.

Int J Environ Res Public Health. 2020 May 9;17(9):3315. doi: 10.3390/ijerph17093315.

MinOmics, an Integrative and Immersive Tool for Multi-Omics Analysis.

J Integr Bioinform. 2018 Jun 21;15(2):20180006. doi: 10.1515/jib-2018-0006.

Automated classification and characterization of the mitotic spindle following knockdown of a mitosis-related protein.

BMC Bioinformatics. 2017 Dec 28;18(Suppl 16):566. doi: 10.1186/s12859-017-1966-4.

MatCol: a tool to measure fluorescence signal colocalisation in biological systems.

Sci Rep. 2017 Aug 21;7(1):8879. doi: 10.1038/s41598-017-08786-1.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

对基因组数据的数据库性能进行基准测试。

Benchmarking database performance for genomic data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献