Department of Computer Science, University of Western Ontario, London, ON N6A 5B7, Canada.
Bioinformatics. 2011 Feb 1;27(3):295-302. doi: 10.1093/bioinformatics/btq653. Epub 2010 Nov 26.
High-throughput sequencing technologies produce very large amounts of data and sequencing errors constitute one of the major problems in analyzing such data. Current algorithms for correcting these errors are not very accurate and do not automatically adapt to the given data.
We present HiTEC, an algorithm that provides a highly accurate, robust and fully automated method to correct reads produced by high-throughput sequencing methods. Our approach provides significantly higher accuracy than previous methods. It is time and space efficient and works very well for all read lengths, genome sizes and coverage levels.
The source code of HiTEC is freely available at www.csd.uwo.ca/~ilie/HiTEC/.
高通量测序技术产生了大量的数据,而测序错误是分析这些数据时面临的主要问题之一。目前用于纠正这些错误的算法不够准确,也不能自动适应给定的数据。
我们提出了 HiTEC,这是一种能够提供高度准确、稳健和全自动的方法来纠正高通量测序方法产生的读取的算法。我们的方法比以前的方法具有更高的准确性。它在时间和空间上都很高效,并且在所有读取长度、基因组大小和覆盖水平上都能很好地工作。
HiTEC 的源代码可在 www.csd.uwo.ca/~ilie/HiTEC/ 上免费获得。