Aniceto Rodrigo, Xavier Rene, Guimarães Valeria, Hondo Fernanda, Holanda Maristela, Walter Maria Emilia, Lifschitz Sérgio
Computer Science Department, University of Brasilia (UNB), 70910-900 Brasilia, DF, Brazil.
Informatics Department, Pontifical Catholic University of Rio de Janeiro (PUC-Rio), 22451-900 Rio de Janeiro, RJ, Brazil.
Int J Genomics. 2015;2015:502795. doi: 10.1155/2015/502795. Epub 2015 Oct 19.
Rapid advances in high-throughput sequencing techniques have created interesting computational challenges in bioinformatics. One of them refers to management of massive amounts of data generated by automatic sequencers. We need to deal with the persistency of genomic data, particularly storing and analyzing these large-scale processed data. To find an alternative to the frequently considered relational database model becomes a compelling task. Other data models may be more effective when dealing with a very large amount of nonconventional data, especially for writing and retrieving operations. In this paper, we discuss the Cassandra NoSQL database approach for storing genomic data. We perform an analysis of persistency and I/O operations with real data, using the Cassandra database system. We also compare the results obtained with a classical relational database system and another NoSQL database approach, MongoDB.
高通量测序技术的快速发展在生物信息学领域带来了有趣的计算挑战。其中之一涉及自动测序仪产生的海量数据的管理。我们需要应对基因组数据的持久性问题,特别是存储和分析这些大规模处理后的数据。寻找一种替代经常被考虑的关系数据库模型成为一项紧迫的任务。在处理大量非传统数据时,尤其是在写入和检索操作方面,其他数据模型可能更有效。在本文中,我们讨论了使用Cassandra NoSQL数据库方法来存储基因组数据。我们使用Cassandra数据库系统对实际数据进行了持久性和I/O操作分析。我们还将所得结果与传统关系数据库系统以及另一种NoSQL数据库方法MongoDB进行了比较。