序列读取存档中协议测序步骤注释的调查。

Investigation into the annotation of protocol sequencing steps in the sequence read archive.

作者信息

Alnasir Jamie, Shanahan Hugh P

机构信息

Department of Computer Science, Royal Holloway, University of London, Egham, TW20 0EX UK.

出版信息

Gigascience. 2015 May 9;4:23. doi: 10.1186/s13742-015-0064-7. eCollection 2015.

DOI:10.1186/s13742-015-0064-7

PMID:25960871

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4425880/

Abstract

BACKGROUND

The workflow for the production of high-throughput sequencing data from nucleic acid samples is complex. There are a series of protocol steps to be followed in the preparation of samples for next-generation sequencing. The quantification of bias in a number of protocol steps, namely DNA fractionation, blunting, phosphorylation, adapter ligation and library enrichment, remains to be determined.

RESULTS

We examined the experimental metadata of the public repository Sequence Read Archive (SRA) in order to ascertain the level of annotation of important sequencing steps in submissions to the database. Using SQL relational database queries (using the SRAdb SQLite database generated by the Bioconductor consortium) to search for keywords commonly occurring in key preparatory protocol steps partitioned over studies, we found that 7.10%, 5.84% and 7.57% of all records (fragmentation, ligation and enrichment, respectively), had at least one keyword corresponding to one of the three protocol steps. Only 4.06% of all records, partitioned over studies, had keywords for all three steps in the protocol (5.58% of all SRA records).

CONCLUSIONS

The current level of annotation in the SRA inhibits systematic studies of bias due to these protocol steps. Downstream from this, meta-analyses and comparative studies based on these data will have a source of bias that cannot be quantified at present.

摘要

背景

从核酸样本生成高通量测序数据的工作流程很复杂。在为下一代测序准备样本时需要遵循一系列协议步骤。多个协议步骤中的偏差量化，即DNA片段化、平端化、磷酸化、接头连接和文库富集，仍有待确定。

结果

我们检查了公共数据库序列读取存档（SRA）的实验元数据，以确定提交到该数据库的重要测序步骤的注释水平。使用SQL关系数据库查询（使用由Bioconductor联盟生成的SRAdb SQLite数据库）来搜索在跨研究划分的关键制备协议步骤中常见的关键词，我们发现所有记录的7.10%、5.84%和7.57%（分别为片段化、连接和富集）至少有一个与三个协议步骤之一对应的关键词。在跨研究划分的所有记录中，只有4.06%的记录在协议的所有三个步骤中有关键词（占所有SRA记录的5.58%）。

结论

SRA目前的注释水平阻碍了对这些协议步骤导致的偏差进行系统研究。在此基础上，基于这些数据的荟萃分析和比较研究将存在目前无法量化的偏差来源。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/562d/4425880/8cc94b9b03a6/13742_2015_64_Fig1_HTML.jpg

相似文献

Investigation into the annotation of protocol sequencing steps in the sequence read archive.序列读取存档中协议测序步骤注释的调查。

Gigascience. 2015 May 9;4:23. doi: 10.1186/s13742-015-0064-7. eCollection 2015.

SRAdb: query and use public next-generation sequencing data from within R.SRAdb：在 R 中查询和使用公共下一代测序数据。

BMC Bioinformatics. 2013 Jan 17;14:19. doi: 10.1186/1471-2105-14-19.

"METAGENOTE: a simplified web platform for metadata annotation of genomic samples and streamlined submission to NCBI's sequence read archive".METAGENOTE：一个简化的基因组样本元数据注释的网络平台，简化了向 NCBI 的序列读取档案提交的流程。

BMC Bioinformatics. 2020 Sep 3;21(1):378. doi: 10.1186/s12859-020-03694-0.

Calculating the quality of public high-throughput sequencing data to obtain a suitable subset for reanalysis from the Sequence Read Archive.计算公共高通量测序数据的质量，以便从序列读取存档中获取合适的子集进行重新分析。

Gigascience. 2017 Jun 1;6(6):1-8. doi: 10.1093/gigascience/gix029.

pysradb: A Python package to query next-generation sequencing metadata and data from NCBI Sequence Read Archive.pysradb：一个用于查询来自NCBI序列读取存档库的下一代测序元数据和数据的Python包。

F1000Res. 2019 Apr 23;8:532. doi: 10.12688/f1000research.18676.1. eCollection 2019.

TerrestrialMetagenomeDB: a public repository of curated and standardized metadata for terrestrial metagenomes.陆地宏基因组数据库：一个经过策展和标准化的陆地宏基因组元数据公共存储库。

Nucleic Acids Res. 2020 Jan 8;48(D1):D626-D632. doi: 10.1093/nar/gkz994.

The CAIRR Pipeline for Submitting Standards-Compliant B and T Cell Receptor Repertoire Sequencing Studies to the National Center for Biotechnology Information Repositories.CAIRR 管道用于向国家生物技术信息中心存储库提交符合标准的 B 和 T 细胞受体文库测序研究。

Front Immunol. 2018 Aug 16;9:1877. doi: 10.3389/fimmu.2018.01877. eCollection 2018.

Systematic comparison of small RNA library preparation protocols for next-generation sequencing.系统比较下一代测序中小 RNA 文库制备方案。

BMC Genomics. 2018 Feb 5;19(1):118. doi: 10.1186/s12864-018-4491-6.

MetaSRA: normalized human sample-specific metadata for the Sequence Read Archive.MetaSRA：序列读取档案中标准化的人类样本特定元数据。

Bioinformatics. 2017 Sep 15;33(18):2914-2923. doi: 10.1093/bioinformatics/btx334.

Extracting allelic read counts from 250,000 human sequencing runs in Sequence Read Archive.从序列读取存档中的250,000次人类测序运行中提取等位基因读数计数。

Pac Symp Biocomput. 2019;24:196-207.

引用本文的文献

Soft Tissue Ewing Sarcoma Cell Drug Resistance Revisited: A Systems Biology Approach.软组织尤文肉瘤细胞药物耐药性再探：系统生物学方法。

Int J Environ Res Public Health. 2023 Jul 3;20(13):6288. doi: 10.3390/ijerph20136288.

A Counterintuitive Neutrophil-Mediated Pattern in COVID-19 Patients Revealed through Transcriptomics Analysis.通过转录组分析揭示 COVID-19 患者中一种反直觉的中性粒细胞介导模式。

Viruses. 2022 Dec 30;15(1):104. doi: 10.3390/v15010104.

Microbial Dark Matter: from Discovery to Applications.微生物暗物质：从发现到应用。

Genomics Proteomics Bioinformatics. 2022 Oct;20(5):867-881. doi: 10.1016/j.gpb.2022.02.007. Epub 2022 Apr 26.

Importance of experimental information (metadata) for archived sequence data: case of specific gene bias due to lag time between sample harvest and RNA protection in RNA sequencing.实验信息（元数据）对于存档序列数据的重要性：RNA测序中样本采集与RNA保护之间的时间间隔导致特定基因偏差的情况。

PeerJ. 2021 Aug 25;9:e11875. doi: 10.7717/peerj.11875. eCollection 2021.

In Silico Methods for the Identification of Diagnostic and Favorable Prognostic Markers in Acute Myeloid Leukemia.基于计算机的方法在急性髓系白血病的诊断和预后标志物鉴定中的应用

Int J Mol Sci. 2021 Sep 5;22(17):9601. doi: 10.3390/ijms22179601.

Identification of conserved transcriptome features between humans and Drosophila in the aging brain utilizing machine learning on combined data from the NIH Sequence Read Archive.利用 NIH Sequence Read Archive 中组合数据的机器学习，鉴定人类和果蝇衰老大脑中保守转录组特征。

PLoS One. 2021 Aug 11;16(8):e0255085. doi: 10.1371/journal.pone.0255085. eCollection 2021.

Investigating Molecular Determinants of Cancer Cell Resistance to Ionizing Radiation Through an Integrative Bioinformatics Approach.通过综合生物信息学方法研究癌细胞对电离辐射抗性的分子决定因素。

Front Cell Dev Biol. 2021 Apr 7;9:620248. doi: 10.3389/fcell.2021.620248. eCollection 2021.

Ten simple rules for annotating sequencing experiments.注释测序实验的十条简单规则。

PLoS Comput Biol. 2020 Oct 5;16(10):e1008260. doi: 10.1371/journal.pcbi.1008260. eCollection 2020 Oct.

Comparative Study of Gut Microbiota in Wild and Captive Giant Pandas ().野生和圈养大熊猫肠道微生物的比较研究()。

Genes (Basel). 2019 Oct 20;10(10):827. doi: 10.3390/genes10100827.

Comprehensive analysis of circRNA expression profiles in humans by RAISE.通过 RAISE 对人类 circRNA 表达谱进行全面分析。

Int J Oncol. 2017 Dec;51(6):1625-1638. doi: 10.3892/ijo.2017.4162. Epub 2017 Oct 16.

本文引用的文献

Ligation bias in illumina next-generation DNA libraries: implications for sequencing ancient genomes.Illumina 新一代 DNA 文库中的连接偏倚：对测序古老基因组的影响。

PLoS One. 2013 Oct 29;8(10):e78575. doi: 10.1371/journal.pone.0078575. eCollection 2013.

Experimental design-based functional mining and characterization of high-throughput sequencing data in the sequence read archive.基于实验设计的高通量测序数据在序列读取档案中的功能挖掘和特征描述。

PLoS One. 2013 Oct 22;8(10):e77910. doi: 10.1371/journal.pone.0077910. eCollection 2013.

Discovering motifs that induce sequencing errors.发现诱导测序错误的模体。

BMC Bioinformatics. 2013;14 Suppl 5(Suppl 5):S1. doi: 10.1186/1471-2105-14-S5-S1. Epub 2013 Apr 10.

Characterizing and measuring bias in sequence data.表征和测量序列数据中的偏差。

Genome Biol. 2013 May 29;14(5):R51. doi: 10.1186/gb-2013-14-5-r51.

Effects of GC bias in next-generation-sequencing data on de novo genome assembly.下一代测序数据中的 GC 偏倚对从头基因组组装的影响。

PLoS One. 2013 Apr 29;8(4):e62856. doi: 10.1371/journal.pone.0062856. Print 2013.

Massively parallel characterization of restriction endonucleases.大规模平行鉴定限制酶。

Nucleic Acids Res. 2013 Jun;41(11):e119. doi: 10.1093/nar/gkt257. Epub 2013 Apr 19.

Next-generation sequencing platforms.下一代测序平台。

Annu Rev Anal Chem (Palo Alto Calif). 2013;6:287-303. doi: 10.1146/annurev-anchem-062012-092628.

Fragmentation of DNA by nebulization.通过喷雾使DNA片段化。

CSH Protoc. 2006 Sep 1;2006(4):pdb.prot4539. doi: 10.1101/pdb.prot4539.

Fragmentation of DNA by sonication.通过超声处理使DNA片段化。

CSH Protoc. 2006 Sep 1;2006(4):pdb.prot4538. doi: 10.1101/pdb.prot4538.

Optimal enzymes for amplifying sequencing libraries.用于扩增测序文库的最佳酶。

Nat Methods. 2011 Dec 28;9(1):10-1. doi: 10.1038/nmeth.1814.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

序列读取存档中协议测序步骤注释的调查。

Investigation into the annotation of protocol sequencing steps in the sequence read archive.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献