Suppr超能文献

序列读取存档中协议测序步骤注释的调查。

Investigation into the annotation of protocol sequencing steps in the sequence read archive.

作者信息

Alnasir Jamie, Shanahan Hugh P

机构信息

Department of Computer Science, Royal Holloway, University of London, Egham, TW20 0EX UK.

出版信息

Gigascience. 2015 May 9;4:23. doi: 10.1186/s13742-015-0064-7. eCollection 2015.

Abstract

BACKGROUND

The workflow for the production of high-throughput sequencing data from nucleic acid samples is complex. There are a series of protocol steps to be followed in the preparation of samples for next-generation sequencing. The quantification of bias in a number of protocol steps, namely DNA fractionation, blunting, phosphorylation, adapter ligation and library enrichment, remains to be determined.

RESULTS

We examined the experimental metadata of the public repository Sequence Read Archive (SRA) in order to ascertain the level of annotation of important sequencing steps in submissions to the database. Using SQL relational database queries (using the SRAdb SQLite database generated by the Bioconductor consortium) to search for keywords commonly occurring in key preparatory protocol steps partitioned over studies, we found that 7.10%, 5.84% and 7.57% of all records (fragmentation, ligation and enrichment, respectively), had at least one keyword corresponding to one of the three protocol steps. Only 4.06% of all records, partitioned over studies, had keywords for all three steps in the protocol (5.58% of all SRA records).

CONCLUSIONS

The current level of annotation in the SRA inhibits systematic studies of bias due to these protocol steps. Downstream from this, meta-analyses and comparative studies based on these data will have a source of bias that cannot be quantified at present.

摘要

背景

从核酸样本生成高通量测序数据的工作流程很复杂。在为下一代测序准备样本时需要遵循一系列协议步骤。多个协议步骤中的偏差量化,即DNA片段化、平端化、磷酸化、接头连接和文库富集,仍有待确定。

结果

我们检查了公共数据库序列读取存档(SRA)的实验元数据,以确定提交到该数据库的重要测序步骤的注释水平。使用SQL关系数据库查询(使用由Bioconductor联盟生成的SRAdb SQLite数据库)来搜索在跨研究划分的关键制备协议步骤中常见的关键词,我们发现所有记录的7.10%、5.84%和7.57%(分别为片段化、连接和富集)至少有一个与三个协议步骤之一对应的关键词。在跨研究划分的所有记录中,只有4.06%的记录在协议的所有三个步骤中有关键词(占所有SRA记录的5.58%)。

结论

SRA目前的注释水平阻碍了对这些协议步骤导致的偏差进行系统研究。在此基础上,基于这些数据的荟萃分析和比较研究将存在目前无法量化的偏差来源。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/562d/4425880/8cc94b9b03a6/13742_2015_64_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验