Suppr超能文献

TraRECo:一种基于贪心策略的从头转录组组装方法,使用一致矩阵进行读错误校正。

TraRECo: a greedy approach based de novo transcriptome assembler with read error correction using consensus matrix.

机构信息

Department of Electronics Eng., College of Engineering, Dankook University, Yongin-si, Korea.

Department of Microbiology, College of Natural Sciences, Dankook University, Cheonan-si, Korea.

出版信息

BMC Genomics. 2018 Sep 4;19(1):653. doi: 10.1186/s12864-018-5034-x.

Abstract

BACKGROUND

The challenges when developing a good de novo transcriptome assembler include how to deal with read errors and sequence repeats. Almost all de novo assemblers utilize a de Bruijn graph, with which complexity grows linearly with data size while suffering from errors and repeats. Although one can correct the errors by inspecting the topological structure of the graph, this is not an easy task when there are too many branches. Two research directions are to improve either the graph reliability or the path search precision, and in this study, we focused on the former.

RESULTS

We present TraRECo, a greedy approach to de novo assembly employing error-aware graph construction. In the proposed approach, we built contigs by direct read alignment within a distance margin and performed a junction search to construct splicing graphs. While doing so, a contig of length l was represented by a 4 × l matrix (called a consensus matrix), in which each element was the base count of the aligned reads so far. A representative sequence was obtained by taking the majority in each column of the consensus matrix to be used for further read alignment. Once the splicing graphs had been obtained, we used IsoLasso to find paths with a noticeable read depth. The experiments using real and simulated reads show that the method provided considerable improvement in sensitivity and moderately better performance when comparing sensitivity and precision. This was achieved by the error-aware graph construction using the consensus matrix, with which the reads having errors were made usable for the graph construction (otherwise, they might have been eventually discarded). This improved the quality of the coverage depth information used in the subsequent path search step and finally the reliability of the graph.

CONCLUSIONS

De novo assembly is mainly used to explore undiscovered isoforms and must be able to represent as many reads as possible in an efficient way. In this sense, TraRECo provides us with a potential alternative for improving graph reliability even though the computational burden is much higher than the single k-mer in the de Bruijn graph approach.

摘要

背景

开发优秀从头转录组组装器面临的挑战包括如何处理读取错误和序列重复。几乎所有的从头组装器都使用 de Bruijn 图,该图的复杂度随数据大小线性增长,同时受到错误和重复的影响。虽然可以通过检查图的拓扑结构来纠正错误,但当分支过多时,这并不是一项容易的任务。两个研究方向是提高图的可靠性或路径搜索精度,在这项研究中,我们专注于前者。

结果

我们提出了 TraRECo,这是一种采用错误感知图构建的贪婪从头组装方法。在提出的方法中,我们通过在距离范围内直接读取对齐构建重叠群,并进行连接搜索构建拼接图。在这样做的过程中,长度为 l 的重叠群由一个 4×l 矩阵(称为共识矩阵)表示,其中每个元素是对齐读取的碱基计数。通过在共识矩阵的每列中取多数来获得代表序列,用于进一步的读取对齐。一旦拼接图已经获得,我们使用 IsoLasso 找到具有明显读取深度的路径。使用真实和模拟读取进行的实验表明,该方法在灵敏度方面提供了相当大的改进,并且在比较灵敏度和精度时性能略好。这是通过使用共识矩阵进行错误感知图构建实现的,通过该方法,具有错误的读取可以用于图构建(否则,它们最终可能会被丢弃)。这提高了后续路径搜索步骤中覆盖深度信息的质量,并最终提高了图的可靠性。

结论

从头组装主要用于探索未发现的异构体,并且必须能够以高效的方式表示尽可能多的读取。从这个意义上说,TraRECo 为我们提供了一种提高图可靠性的潜在替代方法,尽管计算负担比 de Bruijn 图方法中的单个 k-mer 高得多。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7c29/6123912/7ba47d9c1397/12864_2018_5034_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验