Suppr超能文献

欧洲核苷酸档案库的千万亿字节级创新。

Petabyte-scale innovations at the European Nucleotide Archive.

作者信息

Cochrane Guy, Akhtar Ruth, Bonfield James, Bower Lawrence, Demiralp Fehmi, Faruque Nadeem, Gibson Richard, Hoad Gemma, Hubbard Tim, Hunter Christopher, Jang Mikyung, Juhos Szilveszter, Leinonen Rasko, Leonard Steven, Lin Quan, Lopez Rodrigo, Lorenc Dariusz, McWilliam Hamish, Mukherjee Gaurab, Plaister Sheila, Radhakrishnan Rajesh, Robinson Stephen, Sobhany Siamak, Hoopen Petra Ten, Vaughan Robert, Zalunin Vadim, Birney Ewan

机构信息

EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

出版信息

Nucleic Acids Res. 2009 Jan;37(Database issue):D19-25. doi: 10.1093/nar/gkn765. Epub 2008 Oct 31.

Abstract

Dramatic increases in the throughput of nucleotide sequencing machines, and the promise of ever greater performance, have thrust bioinformatics into the era of petabyte-scale data sets. Sequence repositories, which provide the feed for these data sets into the worldwide computational infrastructure, are challenged by the impact of these data volumes. The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/embl), comprising the EMBL Nucleotide Sequence Database and the Ensembl Trace Archive, has identified challenges in the storage, movement, analysis, interpretation and visualization of petabyte-scale data sets. We present here our new repository for next generation sequence data, a brief summary of contents of the ENA and provide details of major developments to submission pipelines, high-throughput rule-based validation infrastructure and data integration approaches.

摘要

核苷酸测序仪通量的急剧增加以及性能不断提升的前景,已将生物信息学推进到了千万亿字节规模数据集的时代。为全球计算基础设施提供这些数据集数据来源的序列数据库,正受到这些数据量的影响而面临挑战。由欧洲分子生物学实验室核苷酸序列数据库(EMBL Nucleotide Sequence Database)和Ensembl序列追踪数据库(Ensembl Trace Archive)组成的欧洲核苷酸档案库(ENA;http://www.ebi.ac.uk/embl),已明确了在千万亿字节规模数据集的存储、传输、分析、解读及可视化方面所面临的挑战。我们在此展示我们新的下一代序列数据存档库,简要概述ENA的内容,并详细介绍提交管道、基于规则的高通量验证基础设施及数据整合方法的主要进展。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c14b/2686451/3a0786a1c882/gkn765f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验