Suppr超能文献

脊椎动物线粒体超高深度测序揭示的测序覆盖偏差模式。

Patterns of sequencing coverage bias revealed by ultra-deep sequencing of vertebrate mitochondria.

机构信息

Department of Ecology and Genetics, Uppsala University, Uppsala SE-75236, Sweden.

出版信息

BMC Genomics. 2014 Jun 12;15(1):467. doi: 10.1186/1471-2164-15-467.

Abstract

BACKGROUND

Genome and transcriptome sequencing applications that rely on variation in sequence depth can be negatively affected if there are systematic biases in coverage. We have investigated patterns of local variation in sequencing coverage by utilising ultra-deep sequencing (>100,000X) of mtDNA obtained during sequencing of two vertebrate genomes, wolverine (Gulo gulo) and collared flycatcher (Ficedula albicollis). With such extreme depth, stochastic variation in coverage should be negligible, which allows us to provide a very detailed, fine-scale picture of sequence dependent coverage variation and sequencing error rates.

RESULTS

Sequencing coverage showed up to six-fold variation across the complete mtDNA and this variation was highly repeatable in sequencing of multiple individuals of the same species. Moreover, coverage in orthologous regions was correlated between the two species and was negatively correlated with GC content. We also found a negative correlation between the site-specific sequencing error rate and coverage, with certain sequence motifs "CCNGCC" being particularly prone to high rates of error and low coverage.

CONCLUSIONS

Our results demonstrate that inherent sequence characteristics govern variation in coverage and suggest that some of this variation, like GC content, should be controlled for in, for example, RNA-Seq and detection of copy number variation.

摘要

背景

依赖于序列深度变化的基因组和转录组测序应用,如果存在覆盖范围的系统偏差,可能会受到负面影响。我们通过利用两种脊椎动物基因组(狼獾(Gulo gulo)和戴胜(Ficedula albicollis))测序过程中获得的超深度测序(> 100,000X),研究了测序覆盖范围的局部变化模式。如此极端的深度,覆盖范围的随机变化应该可以忽略不计,这使我们能够提供与序列相关的覆盖范围变化和测序错误率的非常详细,精细的图片。

结果

整个 mtDNA 的测序覆盖率变化高达六倍,并且在对同一物种的多个个体进行测序时,这种变化具有高度可重复性。此外,两种物种之间的同源区域的覆盖范围相关,并且与 GC 含量呈负相关。我们还发现,特定位置的测序错误率与覆盖率之间存在负相关,某些序列基序“ CCNGCC”特别容易出现高错误率和低覆盖率。

结论

我们的结果表明,固有序列特征决定了覆盖范围的变化,并表明某些变化(如 GC 含量)应在 RNA-Seq 和检测拷贝数变异中加以控制。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f61/4070552/27ef5ecac1c2/12864_2014_6152_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验