Rangamaran Vijaya Raghavan, Uppili Bharathram, Gopal Dharani, Ramalingam Kirubagaran
Marine Biotechnology Division, Ocean Science and Technology for Islands Group, National Institute of Ocean Technology (NIOT), Ministry of Earth Sciences (MoES), Government of India, Chennai, India.
Department of Bioinformatics, School of Chemical and Biotechnology, SASTRA University, Tanjore, India.
J Comput Biol. 2018 Dec;25(12):1301-1311. doi: 10.1089/cmb.2017.0186. Epub 2018 Sep 8.
The advent of next-generation sequencing (NGS) technologies has revolutionized the world of genomic research. Millions of sequences are generated in a short period of time and they provide intriguing insights to the researcher. Many NGS platforms have evolved over a period of time and their efficiency has been ever increasing. Still, primarily because of the chemistry, glitch in the sequencing machine and human handling errors, some artifacts tend to exist in the final sequence data set. These sequence errors have a profound impact on the downstream analyses and may provide misleading information. Hence, filtering of these erroneous reads has become inevitable and myriad of tools are available for this purpose. However, many of them are accessible as a command line interface that requires the user to enter each command manually. Here, we report EasyQC, a tool for NGS data quality control (QC) with a graphical user interface providing options to carry out trimming of NGS reads based on quality, length, homopolymer, and ambiguous bases. EasyQC also possesses features such as format converter, paired end merger, adapter trimmer, and a graph generator that generates quality distribution, length distribution, GC content, and base composition graphs. Comparison of raw and processed sequence data sets using EasyQC suggested significant increase in overall quality of the sequences. Testing of EasyQC using NGS data sets on a standalone desktop proved to be relatively faster. EasyQC is developed using PERL modules and can be executed in Windows and Linux platforms. With the various QC features, easy interface for end users, and cross-platform compatibility, EasyQC would be a valuable addition to the already existing tools facilitating better downstream analyses.
新一代测序(NGS)技术的出现彻底改变了基因组研究领域。在短时间内就能生成数百万条序列,为研究人员提供了引人入胜的见解。许多NGS平台在一段时间内不断发展,其效率也在不断提高。然而,主要由于化学因素、测序机器故障和人工操作失误,最终的序列数据集中往往存在一些伪像。这些序列错误对下游分析有深远影响,可能会提供误导性信息。因此,过滤这些错误读段已成为必然,并且有大量工具可用于此目的。然而,它们中的许多只能通过命令行界面访问,这要求用户手动输入每个命令。在此,我们报告了EasyQC,这是一种用于NGS数据质量控制(QC)的工具,具有图形用户界面,提供了根据质量、长度、同聚物和模糊碱基对NGS读段进行修剪的选项。EasyQC还具有格式转换器、双端合并器、接头修剪器以及生成质量分布、长度分布、GC含量和碱基组成图的图形生成器等功能。使用EasyQC对原始和处理后的序列数据集进行比较表明,序列的整体质量有显著提高。在独立桌面上使用NGS数据集对EasyQC进行测试证明速度相对较快。EasyQC是使用PERL模块开发的,可以在Windows和Linux平台上执行。凭借各种QC功能、易于使用的终端用户界面以及跨平台兼容性,EasyQC将成为现有工具中一个有价值的补充,有助于更好地进行下游分析。