Department of Biomedical Informatics, Center for Human Genetics Research, Vanderbilt University, 2215 Garland Ave, Nashville, TN 37232, USA.
Database (Oxford). 2013 Jul 26;2013:bat056. doi: 10.1093/database/bat056. Print 2013.
Efficient storage and retrieval of genomic annotations based on range intervals is necessary, given the amount of data produced by next-generation sequencing studies. The indexing strategies of relational database systems (such as MySQL) greatly inhibit their use in genomic annotation tasks. This has led to the development of stand-alone applications that are dependent on flat-file libraries. In this work, we introduce MyNCList, an implementation of the NCList data structure within a MySQL database. MyNCList enables the storage, update and rapid retrieval of genomic annotations from the convenience of a relational database system. Range-based annotations of 1 million variants are retrieved in under a minute, making this approach feasible for whole-genome annotation tasks. Database URL: https://github.com/bushlab/mynclist.
基于范围区间的高效存储和检索基因组注释是必要的,因为下一代测序研究产生的数据量非常大。关系型数据库系统(如 MySQL)的索引策略极大地限制了它们在基因组注释任务中的使用。这导致了独立应用程序的开发,这些应用程序依赖于平面文件库。在这项工作中,我们引入了 MyNCList,这是 MySQL 数据库中 NCList 数据结构的实现。MyNCList 使从关系型数据库系统的便利性中存储、更新和快速检索基因组注释成为可能。在不到一分钟的时间内检索到 100 万个变体的基于范围的注释,这使得这种方法对于全基因组注释任务是可行的。数据库 URL:https://github.com/bushlab/mynclist。