Whitacre Lynsey K, Tizioto Polyana C, Kim JaeWoo, Sonstegard Tad S, Schroeder Steven G, Alexander Leeson J, Medrano Juan F, Schnabel Robert D, Taylor Jeremy F, Decker Jared E
Informatics Institute, University of Missouri, Columbia, MO, 65211, USA.
Division of Animal Sciences, University of Missouri, Columbia, MO, 65211, USA.
BMC Genomics. 2015 Dec 29;16:1114. doi: 10.1186/s12864-015-2313-7.
Next-generation sequencing projects commonly commence by aligning reads to a reference genome assembly. While improvements in alignment algorithms and computational hardware have greatly enhanced the efficiency and accuracy of alignments, a significant percentage of reads often remain unmapped.
We generated de novo assemblies of unmapped reads from the DNA and RNA sequencing of the Bos taurus reference individual and identified the closest matching sequence to each contig by alignment to the NCBI non-redundant nucleotide database using BLAST. As expected, many of these contigs represent vertebrate sequence that is absent, incomplete, or misassembled in the UMD3.1 reference assembly. However, numerous additional contigs represent invertebrate species. Most prominent were several species of Spirurid nematodes and a blood-borne parasite, Babesia bigemina. These species are either not present in the US or are not known to infect taurine cattle and the reference animal appears to have been host to unsequenced sister species.
We demonstrate the importance of exploring unmapped reads to ascertain sequences that are either absent or misassembled in the reference assembly and for detecting sequences indicative of parasitic or commensal organisms.
新一代测序项目通常通过将 reads 比对到参考基因组组装序列来启动。虽然比对算法和计算硬件的改进极大地提高了比对的效率和准确性,但仍有相当比例的 reads 常常无法比对上。
我们从牛参考个体的 DNA 和 RNA 测序中生成了未比对 reads 的从头组装序列,并使用 BLAST 通过与 NCBI 非冗余核苷酸数据库比对,为每个重叠群鉴定出最匹配的序列。正如预期的那样,这些重叠群中有许多代表了在 UMD3.1 参考组装序列中缺失、不完整或组装错误的脊椎动物序列。然而,大量额外的重叠群代表了无脊椎动物物种。最突出的是几种旋尾线虫和一种血源寄生虫——双芽巴贝斯虫。这些物种在美国要么不存在,要么不被认为会感染普通牛,而且参考动物似乎是未测序姐妹物种的宿主。
我们证明了探索未比对 reads 以确定参考组装序列中缺失或组装错误的序列以及检测指示寄生或共生生物的序列的重要性。