• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

欧洲核苷酸档案库的千万亿字节级创新。

Petabyte-scale innovations at the European Nucleotide Archive.

作者信息

Cochrane Guy, Akhtar Ruth, Bonfield James, Bower Lawrence, Demiralp Fehmi, Faruque Nadeem, Gibson Richard, Hoad Gemma, Hubbard Tim, Hunter Christopher, Jang Mikyung, Juhos Szilveszter, Leinonen Rasko, Leonard Steven, Lin Quan, Lopez Rodrigo, Lorenc Dariusz, McWilliam Hamish, Mukherjee Gaurab, Plaister Sheila, Radhakrishnan Rajesh, Robinson Stephen, Sobhany Siamak, Hoopen Petra Ten, Vaughan Robert, Zalunin Vadim, Birney Ewan

机构信息

EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

出版信息

Nucleic Acids Res. 2009 Jan;37(Database issue):D19-25. doi: 10.1093/nar/gkn765. Epub 2008 Oct 31.

DOI:10.1093/nar/gkn765
PMID:18978013
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2686451/
Abstract

Dramatic increases in the throughput of nucleotide sequencing machines, and the promise of ever greater performance, have thrust bioinformatics into the era of petabyte-scale data sets. Sequence repositories, which provide the feed for these data sets into the worldwide computational infrastructure, are challenged by the impact of these data volumes. The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/embl), comprising the EMBL Nucleotide Sequence Database and the Ensembl Trace Archive, has identified challenges in the storage, movement, analysis, interpretation and visualization of petabyte-scale data sets. We present here our new repository for next generation sequence data, a brief summary of contents of the ENA and provide details of major developments to submission pipelines, high-throughput rule-based validation infrastructure and data integration approaches.

摘要

核苷酸测序仪通量的急剧增加以及性能不断提升的前景,已将生物信息学推进到了千万亿字节规模数据集的时代。为全球计算基础设施提供这些数据集数据来源的序列数据库,正受到这些数据量的影响而面临挑战。由欧洲分子生物学实验室核苷酸序列数据库(EMBL Nucleotide Sequence Database)和Ensembl序列追踪数据库(Ensembl Trace Archive)组成的欧洲核苷酸档案库(ENA;http://www.ebi.ac.uk/embl),已明确了在千万亿字节规模数据集的存储、传输、分析、解读及可视化方面所面临的挑战。我们在此展示我们新的下一代序列数据存档库,简要概述ENA的内容,并详细介绍提交管道、基于规则的高通量验证基础设施及数据整合方法的主要进展。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c14b/2686451/81fae543517f/gkn765f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c14b/2686451/3a0786a1c882/gkn765f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c14b/2686451/09600eb0c5cc/gkn765f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c14b/2686451/76ea4b4031f0/gkn765f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c14b/2686451/81fae543517f/gkn765f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c14b/2686451/3a0786a1c882/gkn765f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c14b/2686451/09600eb0c5cc/gkn765f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c14b/2686451/76ea4b4031f0/gkn765f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c14b/2686451/81fae543517f/gkn765f4.jpg

相似文献

1
Petabyte-scale innovations at the European Nucleotide Archive.欧洲核苷酸档案库的千万亿字节级创新。
Nucleic Acids Res. 2009 Jan;37(Database issue):D19-25. doi: 10.1093/nar/gkn765. Epub 2008 Oct 31.
2
Priorities for nucleotide trace, sequence and annotation data capture at the Ensembl Trace Archive and the EMBL Nucleotide Sequence Database.在Ensembl序列档案库和EMBL核苷酸序列数据库中进行核苷酸追踪、序列及注释数据捕获的优先事项。
Nucleic Acids Res. 2008 Jan;36(Database issue):D5-12. doi: 10.1093/nar/gkm1018. Epub 2007 Nov 26.
3
The European Nucleotide Archive.欧洲核苷酸数据库。
Nucleic Acids Res. 2011 Jan;39(Database issue):D28-31. doi: 10.1093/nar/gkq967. Epub 2010 Oct 23.
4
The EMBL Nucleotide Sequence Database.欧洲分子生物学实验室核苷酸序列数据库。
Nucleic Acids Res. 2002 Jan 1;30(1):21-6. doi: 10.1093/nar/30.1.21.
5
Major submissions tool developments at the European Nucleotide Archive.欧洲核苷酸档案的主要提交工具开发。
Nucleic Acids Res. 2012 Jan;40(Database issue):D43-7. doi: 10.1093/nar/gkr946. Epub 2011 Nov 12.
6
Content discovery and retrieval services at the European Nucleotide Archive.欧洲核苷酸档案库中的内容发现与检索服务。
Nucleic Acids Res. 2015 Jan;43(Database issue):D23-9. doi: 10.1093/nar/gku1129. Epub 2014 Nov 17.
7
The European Nucleotide Archive in 2020.2020 年的欧洲核苷酸档案库。
Nucleic Acids Res. 2021 Jan 8;49(D1):D82-D85. doi: 10.1093/nar/gkaa1028.
8
Improvements to services at the European Nucleotide Archive.欧洲核苷酸档案服务的改进。
Nucleic Acids Res. 2010 Jan;38(Database issue):D39-45. doi: 10.1093/nar/gkp998. Epub 2009 Nov 11.
9
The European Nucleotide Archive in 2021.2021 年的欧洲核苷酸档案库。
Nucleic Acids Res. 2022 Jan 7;50(D1):D106-D110. doi: 10.1093/nar/gkab1051.
10
The European Nucleotide Archive in 2023.2023 年的欧洲核苷酸档案库。
Nucleic Acids Res. 2024 Jan 5;52(D1):D92-D97. doi: 10.1093/nar/gkad1067.

引用本文的文献

1
Chromosome-scale genome assembly and annotation of two geographically distinct strains of malaria vector Anopheles albimanus.两种地理上不同的疟疾媒介白纹伊蚊菌株的染色体水平基因组组装与注释
Sci Rep. 2025 Jun 3;15(1):19448. doi: 10.1038/s41598-025-01713-9.
2
Transforming Cardiovascular Care With Artificial Intelligence: From Discovery to Practice: JACC State-of-the-Art Review.人工智能引领心血管照护变革:从发现到实践:JACC 前沿观点述评。
J Am Coll Cardiol. 2024 Jul 2;84(1):97-114. doi: 10.1016/j.jacc.2024.05.003.
3
UTexas Aptamer Database: the collection and long-term preservation of aptamer sequence information.

本文引用的文献

1
High-throughput sequencing provides insights into genome variation and evolution in Salmonella Typhi.高通量测序为伤寒沙门氏菌的基因组变异和进化提供了见解。
Nat Genet. 2008 Aug;40(8):987-93. doi: 10.1038/ng.195. Epub 2008 Jul 27.
2
The minimum information about a genome sequence (MIGS) specification.基因组序列最小信息(MIGS)规范
Nat Biotechnol. 2008 May;26(5):541-7. doi: 10.1038/nbt1360.
3
Comparative analysis of Acinetobacters: three genomes for three lifestyles.不动杆菌的比较分析:三种生活方式的三个基因组
UTexas 适体数据库:适体序列信息的收集和长期保存。
Nucleic Acids Res. 2024 Jan 5;52(D1):D351-D359. doi: 10.1093/nar/gkad959.
4
Scientific Discovery Games for Biomedical Research.用于生物医学研究的科学发现游戏。
Annu Rev Biomed Data Sci. 2019 Jul;2(1):253-279. doi: 10.1146/annurev-biodatasci-072018-021139.
5
Fungal metabarcoding data integration framework for the MycoDiversity DataBase (MDDB).真菌代谢组条形码数据集成框架,用于 MycoDiversity DataBase (MDDB)。
J Integr Bioinform. 2020 May 28;17(1):20190046. doi: 10.1515/jib-2019-0046.
6
Converting DNA and chemical fingerprints into two-dimensional barcode.将DNA和化学指纹转换为二维条形码。
J Ginseng Res. 2017 Jul;41(3):339-346. doi: 10.1016/j.jgr.2016.06.006. Epub 2016 Jul 21.
7
Reconstructing 16S rRNA genes in metagenomic data.重建宏基因组数据中的 16S rRNA 基因。
Bioinformatics. 2015 Jun 15;31(12):i35-43. doi: 10.1093/bioinformatics/btv231.
8
ArrayExpress update--simplifying data submissions.ArrayExpress更新——简化数据提交
Nucleic Acids Res. 2015 Jan;43(Database issue):D1113-6. doi: 10.1093/nar/gku1057. Epub 2014 Oct 31.
9
Integrating pathways of Parkinson's disease in a molecular interaction map.在分子相互作用图谱中整合帕金森病的信号通路
Mol Neurobiol. 2014 Feb;49(1):88-102. doi: 10.1007/s12035-013-8489-4. Epub 2013 Jul 7.
10
Building models using Reactome pathways as templates.以Reactome通路为模板构建模型。
Methods Mol Biol. 2013;1021:273-83. doi: 10.1007/978-1-62703-450-0_14.
PLoS One. 2008 Mar 19;3(3):e1805. doi: 10.1371/journal.pone.0001805.
4
The Mouse Genome Database (MGD): mouse biology and model systems.小鼠基因组数据库(MGD):小鼠生物学与模型系统
Nucleic Acids Res. 2008 Jan;36(Database issue):D724-8. doi: 10.1093/nar/gkm961. Epub 2007 Dec 23.
5
GenBank.基因银行
Nucleic Acids Res. 2008 Jan;36(Database issue):D25-30. doi: 10.1093/nar/gkm929. Epub 2007 Dec 11.
6
Database resources of the National Center for Biotechnology Information.美国国立生物技术信息中心的数据库资源。
Nucleic Acids Res. 2008 Jan;36(Database issue):D13-21. doi: 10.1093/nar/gkm1000. Epub 2007 Nov 27.
7
The universal protein resource (UniProt).通用蛋白质资源(UniProt)。
Nucleic Acids Res. 2008 Jan;36(Database issue):D190-5. doi: 10.1093/nar/gkm895. Epub 2007 Nov 27.
8
Priorities for nucleotide trace, sequence and annotation data capture at the Ensembl Trace Archive and the EMBL Nucleotide Sequence Database.在Ensembl序列档案库和EMBL核苷酸序列数据库中进行核苷酸追踪、序列及注释数据捕获的优先事项。
Nucleic Acids Res. 2008 Jan;36(Database issue):D5-12. doi: 10.1093/nar/gkm1018. Epub 2007 Nov 26.
9
Ensembl 2008.Ensembl 2008。
Nucleic Acids Res. 2008 Jan;36(Database issue):D707-14. doi: 10.1093/nar/gkm988. Epub 2007 Nov 13.
10
The HGNC Database in 2008: a resource for the human genome.2008年的HGNC数据库:人类基因组资源
Nucleic Acids Res. 2008 Jan;36(Database issue):D445-8. doi: 10.1093/nar/gkm881. Epub 2007 Nov 4.