Suppr超能文献

你的下一代测序数据中有什么?对来自牛参考个体的未映射DNA和RNA序列读数的探索。

What's in your next-generation sequence data? An exploration of unmapped DNA and RNA sequence reads from the bovine reference individual.

作者信息

Whitacre Lynsey K, Tizioto Polyana C, Kim JaeWoo, Sonstegard Tad S, Schroeder Steven G, Alexander Leeson J, Medrano Juan F, Schnabel Robert D, Taylor Jeremy F, Decker Jared E

机构信息

Informatics Institute, University of Missouri, Columbia, MO, 65211, USA.

Division of Animal Sciences, University of Missouri, Columbia, MO, 65211, USA.

出版信息

BMC Genomics. 2015 Dec 29;16:1114. doi: 10.1186/s12864-015-2313-7.

Abstract

BACKGROUND

Next-generation sequencing projects commonly commence by aligning reads to a reference genome assembly. While improvements in alignment algorithms and computational hardware have greatly enhanced the efficiency and accuracy of alignments, a significant percentage of reads often remain unmapped.

RESULTS

We generated de novo assemblies of unmapped reads from the DNA and RNA sequencing of the Bos taurus reference individual and identified the closest matching sequence to each contig by alignment to the NCBI non-redundant nucleotide database using BLAST. As expected, many of these contigs represent vertebrate sequence that is absent, incomplete, or misassembled in the UMD3.1 reference assembly. However, numerous additional contigs represent invertebrate species. Most prominent were several species of Spirurid nematodes and a blood-borne parasite, Babesia bigemina. These species are either not present in the US or are not known to infect taurine cattle and the reference animal appears to have been host to unsequenced sister species.

CONCLUSIONS

We demonstrate the importance of exploring unmapped reads to ascertain sequences that are either absent or misassembled in the reference assembly and for detecting sequences indicative of parasitic or commensal organisms.

摘要

背景

新一代测序项目通常通过将 reads 比对到参考基因组组装序列来启动。虽然比对算法和计算硬件的改进极大地提高了比对的效率和准确性,但仍有相当比例的 reads 常常无法比对上。

结果

我们从牛参考个体的 DNA 和 RNA 测序中生成了未比对 reads 的从头组装序列,并使用 BLAST 通过与 NCBI 非冗余核苷酸数据库比对,为每个重叠群鉴定出最匹配的序列。正如预期的那样,这些重叠群中有许多代表了在 UMD3.1 参考组装序列中缺失、不完整或组装错误的脊椎动物序列。然而,大量额外的重叠群代表了无脊椎动物物种。最突出的是几种旋尾线虫和一种血源寄生虫——双芽巴贝斯虫。这些物种在美国要么不存在,要么不被认为会感染普通牛,而且参考动物似乎是未测序姐妹物种的宿主。

结论

我们证明了探索未比对 reads 以确定参考组装序列中缺失或组装错误的序列以及检测指示寄生或共生生物的序列的重要性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/41bf/4696311/956f386a29e4/12864_2015_2313_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验