Key Laboratory of Molecular Biology for Infectious Diseases (Ministry of Education), Institute for Viral Hepatitis, Department of Infectious Diseases, The Second Affiliated Hospital, Chongqing Medical University, Chongqing, 400010, P.R. China.
Center for Hepatobillary and Pancreatic Diseases, Beijing Tsinghua Changgung Hospital, Medical Center, Tsinghua University, Beijing, 100044, P.R. China.
Sci Rep. 2017 Aug 14;7(1):8106. doi: 10.1038/s41598-017-08139-y.
Ion Torrent Personal Genome Machine (PGM) technology is a mid-length read, low-cost and high-speed next-generation sequencing platform with a relatively high insertion and deletion (indel) error rate. A full systematic assessment of the effectiveness of various error correction algorithms in PGM viral datasets (e.g., hepatitis B virus (HBV)) has not been performed. We examined 19 quality-trimmed PGM datasets for the HBV reverse transcriptase (RT) region and found a total error rate of 0.48% ± 0.12%. Deletion errors were clearly present at the ends of homopolymer runs. Tests using both real and simulated data showed that the algorithms differed in their abilities to detect and correct errors and that the error rate and sequencing depth significantly affected the performance. Of the algorithms tested, Pollux showed a better overall performance but tended to over-correct 'genuine' substitution variants, whereas Fiona proved to be better at distinguishing these variants from sequencing errors. We found that the combined use of Pollux and Fiona gave the best results when error-correcting Ion Torrent PGM viral data.
Ion Torrent 个人基因组测序仪 (PGM) 技术是一种中长读长、低成本、高通量的新一代测序平台,具有相对较高的插入和缺失 (indel) 错误率。尚未对 PGM 病毒数据集(例如乙型肝炎病毒 (HBV))中的各种纠错算法的有效性进行全面系统的评估。我们对 19 个经过质量修剪的 PGM HBV 逆转录酶 (RT) 区数据集进行了检查,发现总错误率为 0.48%±0.12%。在同源多聚体序列的末端明显存在缺失错误。使用真实数据和模拟数据的测试表明,这些算法在检测和纠正错误的能力上存在差异,错误率和测序深度显著影响性能。在测试的算法中,Pollux 表现出更好的整体性能,但倾向于过度校正“真实”的替代变体,而 Fiona 则更好地将这些变体与测序错误区分开来。我们发现,在对 Ion Torrent PGM 病毒数据进行纠错时,联合使用 Pollux 和 Fiona 可以获得最佳结果。