• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

DNA 序列存档的未来。

The future of DNA sequence archiving.

机构信息

EMBL-Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, United Kingdom.

出版信息

Gigascience. 2012 Jul 12;1(1):2. doi: 10.1186/2047-217X-1-2.

DOI:10.1186/2047-217X-1-2
PMID:23587147
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3617450/
Abstract

Archives operating under the International Nucleotide Sequence Database Collaboration currently preserve all submitted sequences equally, but rapid increases in the rate of global sequence production will soon require differentiated treatment of DNA sequences submitted for archiving. Here, we propose a graded system in which the ease of reproduction of a sequencing-based experiment and the relative availability of a sample for resequencing define the level of lossy compression applied to stored data.

摘要

目前,在国际核苷酸序列数据库协作下运行的档案库平等地保存所有提交的序列,但全球序列产生率的快速增长将很快要求对提交存档的 DNA 序列进行差异化处理。在这里,我们提出了一个分级系统,其中基于测序的实验的可重复性和样本的可重新测序的相对可用性定义了应用于存储数据的有损压缩的级别。

相似文献

1
The future of DNA sequence archiving.DNA 序列存档的未来。
Gigascience. 2012 Jul 12;1(1):2. doi: 10.1186/2047-217X-1-2.
2
Performance evaluation of lossy quality compression algorithms for RNA-seq data.RNA-seq 数据有损质量压缩算法的性能评估。
BMC Bioinformatics. 2020 Jul 20;21(1):321. doi: 10.1186/s12859-020-03658-4.
3
The MetaGens algorithm for metagenomic database lossy compression and subject alignment.宏基因组数据库有损压缩和主题对齐的 MetaGens 算法。
Database (Oxford). 2023 Aug 11;2023. doi: 10.1093/database/baad053.
4
GenBase: A Nucleotide Sequence Database.GenBase:一个核苷酸序列数据库。
Genomics Proteomics Bioinformatics. 2024 Sep 13;22(3). doi: 10.1093/gpbjnl/qzae047.
5
Quality measurement of lossy compression in medical imaging.医学成像中有损压缩的质量测量。
Prague Med Rep. 2005;106(1):5-26.
6
Mandated data archiving greatly improves access to research data.强制数据归档大大提高了研究数据的可访问性。
FASEB J. 2013 Apr;27(4):1304-8. doi: 10.1096/fj.12-218164. Epub 2013 Jan 3.
7
GSA: Genome Sequence Archive<sup/>.GSA:基因组序列档案库。
Genomics Proteomics Bioinformatics. 2017 Feb;15(1):14-18. doi: 10.1016/j.gpb.2017.01.001. Epub 2017 Feb 2.
8
Priorities for nucleotide trace, sequence and annotation data capture at the Ensembl Trace Archive and the EMBL Nucleotide Sequence Database.在Ensembl序列档案库和EMBL核苷酸序列数据库中进行核苷酸追踪、序列及注释数据捕获的优先事项。
Nucleic Acids Res. 2008 Jan;36(Database issue):D5-12. doi: 10.1093/nar/gkm1018. Epub 2007 Nov 26.
9
NGC: lossless and lossy compression of aligned high-throughput sequencing data.NGC:对齐高通量测序数据的无损和有损压缩。
Nucleic Acids Res. 2013 Jan 7;41(1):e27. doi: 10.1093/nar/gks939. Epub 2012 Oct 12.
10
CDISC standard-based electronic archiving of clinical trials.基于CDISC标准的临床试验电子存档
Methods Inf Med. 2009;48(5):408-13. doi: 10.3414/ME9236. Epub 2009 Jul 20.

引用本文的文献

1
Best practices for genetic and genomic data archiving.遗传和基因组数据归档的最佳实践。
Nat Ecol Evol. 2024 Jul;8(7):1224-1232. doi: 10.1038/s41559-024-02423-7. Epub 2024 May 24.
2
FCLQC: fast and concurrent lossless quality scores compressor.FCLQC:快速并发无损质量评分压缩器。
BMC Bioinformatics. 2021 Dec 20;22(1):606. doi: 10.1186/s12859-021-04516-7.
3
Calculating the quality of public high-throughput sequencing data to obtain a suitable subset for reanalysis from the Sequence Read Archive.计算公共高通量测序数据的质量,以便从序列读取存档中获取合适的子集进行重新分析。
Gigascience. 2017 Jun 1;6(6):1-8. doi: 10.1093/gigascience/gix029.
4
Recommendations on e-infrastructures for next-generation sequencing.关于下一代测序电子基础设施的建议。
Gigascience. 2016 Jun 7;5:26. doi: 10.1186/s13742-016-0132-7.
5
elPrep: High-Performance Preparation of Sequence Alignment/Map Files for Variant Calling.elPrep:用于变异检测的序列比对/映射文件的高性能制备
PLoS One. 2015 Jul 16;10(7):e0132868. doi: 10.1371/journal.pone.0132868. eCollection 2015.
6
Data compression for sequencing data.测序数据的数据压缩
Algorithms Mol Biol. 2013 Nov 18;8(1):25. doi: 10.1186/1748-7188-8-25.
7
Assembly information services in the European Nucleotide Archive.欧洲核苷酸档案中的组装信息服务。
Nucleic Acids Res. 2014 Jan;42(Database issue):D38-43. doi: 10.1093/nar/gkt1082. Epub 2013 Nov 8.
8
Lessons learned from implementing a national infrastructure in Sweden for storage and analysis of next-generation sequencing data.从瑞典实施下一代测序数据存储和分析的国家基础设施中吸取的经验教训。
Gigascience. 2013 Jun 25;2(1):9. doi: 10.1186/2047-217X-2-9.
9
QualComp: a new lossy compressor for quality scores based on rate distortion theory.QualComp:一种基于率失真理论的新的基于质量分数的有损压缩器。
BMC Bioinformatics. 2013 Jun 8;14:187. doi: 10.1186/1471-2105-14-187.
10
Large and linked in scientific publishing.在科学出版领域规模庞大且相互关联。
Gigascience. 2012 Jul 12;1(1):1. doi: 10.1186/2047-217X-1-1.

本文引用的文献

1
The International Nucleotide Sequence Database Collaboration.国际核苷酸序列数据库协作组织。
Nucleic Acids Res. 2013 Jan;41(Database issue):D21-4. doi: 10.1093/nar/gks1084. Epub 2012 Nov 24.
2
The Western English Channel contains a persistent microbial seed bank.英吉利海峡西部蕴藏着持久的微生物种子库。
ISME J. 2012 Jun;6(6):1089-93. doi: 10.1038/ismej.2011.162. Epub 2011 Nov 10.
3
The Sequence Read Archive: explosive growth of sequencing data.序列读取档案:测序数据的爆炸式增长。
Nucleic Acids Res. 2012 Jan;40(Database issue):D54-6. doi: 10.1093/nar/gkr854. Epub 2011 Oct 18.
4
Efficient storage of high throughput DNA sequencing data using reference-based compression.利用基于参考的压缩技术高效存储高通量 DNA 测序数据。
Genome Res. 2011 May;21(5):734-40. doi: 10.1101/gr.114819.110. Epub 2011 Jan 18.
5
The International Nucleotide Sequence Database Collaboration.国际核苷酸序列数据库协作组织
Nucleic Acids Res. 2011 Jan;39(Database issue):D15-8. doi: 10.1093/nar/gkq1150. Epub 2010 Nov 23.
6
Genomic information infrastructure after the deluge.洪灾后的基因组信息基础设施。
Genome Biol. 2010;11(7):402. doi: 10.1186/gb-2010-11-7-402. Epub 2010 Jul 26.
7
TranscriptSNPView: a genome-wide catalog of mouse coding variation.转录本单核苷酸多态性视图:小鼠编码变异的全基因组目录。
Nat Genet. 2006 Aug;38(8):853. doi: 10.1038/ng0806-853a.
8
Serendipitous discovery of Wolbachia genomes in multiple Drosophila species.在多种果蝇物种中偶然发现沃尔巴克氏体基因组。
Genome Biol. 2005;6(3):R23. doi: 10.1186/gb-2005-6-3-r23. Epub 2005 Feb 22.
9
Whole-genome random sequencing and assembly of Haemophilus influenzae Rd.流感嗜血杆菌Rd的全基因组随机测序与组装
Science. 1995 Jul 28;269(5223):496-512. doi: 10.1126/science.7542800.
10
Nucleotide sequence of bacteriophage phi X174 DNA.噬菌体φX174 DNA的核苷酸序列。
Nature. 1977 Feb 24;265(5596):687-95. doi: 10.1038/265687a0.