Lian S, Tu Y, Wang Y, Chen X, Wang L
School of Physics and Electronic Engineering, Xinyang Normal University, Xinyang City, China.
School of Life Science, Xinyang Normal University, Xinyang City, China
Genet Mol Res. 2016 Jul 25;15(3):gmr8790. doi: 10.4238/gmr.15038790.
Repetitive sequences of variable length are common in almost all eukaryotic genomes, and most of them are presumed to have important biomedical functions and can cause genomic instability. Next-generation sequencing (NGS) technologies provide the possibility of identifying capturing these repetitive sequences directly from the NGS data. In this study, we assessed the performances in identifying capturing repeats of leading assemblers, such as Velvet, SOAPdenovo, SGA, MSR-CA, Bambus2, ALLPATHS-LG, and AByss using three real NGS datasets. Our results indicated that most of them performed poorly in capturing the repeats. Consequently, we proposed a repetitive sequence assembler, named NGSReper, for capturing repeats from NGS data. Simulated datasets were used to validate the feasibility of NGSReper. The results indicate that the completeness of capturing repeat is up to 99%. Cross validation was performed in three real NGS datasets, and extensive comparisons indicate that NGSReper performed best in terms of completeness and accuracy in capturing repeats. In conclusion, NGSReper is an appropriate and suitable tool for capturing repeats directly from NGS data.
几乎在所有真核生物基因组中,可变长度的重复序列都很常见,并且大多数此类序列被认为具有重要的生物医学功能,还可能导致基因组不稳定。新一代测序(NGS)技术提供了直接从NGS数据中识别和捕获这些重复序列的可能性。在本研究中,我们使用三个真实的NGS数据集评估了诸如Velvet、SOAPdenovo、SGA、MSR-CA、Bambus2、ALLPATHS-LG和AByss等主流序列组装软件在识别和捕获重复序列方面的性能。我们的结果表明,它们中的大多数在捕获重复序列方面表现不佳。因此,我们提出了一种名为NGSReper的重复序列组装软件,用于从NGS数据中捕获重复序列。使用模拟数据集验证了NGSReper的可行性。结果表明,捕获重复序列的完整性高达99%。在三个真实的NGS数据集上进行了交叉验证,广泛的比较表明,NGSReper在捕获重复序列的完整性和准确性方面表现最佳。总之,NGSReper是一种直接从NGS数据中捕获重复序列的合适工具。