Philipps-University of Marburg, Department of Mathematics and Computer Science, Marburg, 353032, Germany.
Sci Rep. 2020 Apr 1;10(1):5750. doi: 10.1038/s41598-020-62675-8.
Next-generation sequencing (NGS) offers the opportunity to sequence millions and billions of DNA sequences in a short period, leading to novel applications in personalized medicine, such as cancer diagnostics or antiviral therapy. Nevertheless, sequencing technologies have different error rates, which occur during the sequencing process. If the NGS data is used for diagnostics, these sequences with errors are typically neglected or a worst-case scenario is assumed. In the current study, we focused on the impact of ambiguous bases on therapy recommendations for Human Immunodeficiency Virus 1 (HIV-1) patients. Concretely, we analyzed the treatment recommendation with entry blockers based on prediction models for co-receptor tropism. We compared three different error handling strategies that have been used in the literature, namely (i) neglection, (ii) worst-case assumption, and (iii) deconvolution with a majority vote. We could show that for two or more ambiguous positions per sequence a reliable prediction is generally no longer possible. Moreover, also the position of ambiguity plays a crucial role. Thus, we analyzed the error probability distributions of existing sequencing technologies, e.g., Illumina MiSeq or PacBio, with respect to the aforementioned error handling strategies and it turned out that neglection outperforms the other strategies in the case where no systematic errors are present. In other cases, the deconvolution strategy with the majority vote should be preferred.
下一代测序(NGS)提供了在短时间内对数百万和数十亿个 DNA 序列进行测序的机会,从而为个性化医学(如癌症诊断或抗病毒治疗)带来了新的应用。然而,测序技术具有不同的错误率,这些错误在测序过程中产生。如果 NGS 数据用于诊断,通常会忽略这些带有错误的序列,或者假设最坏情况。在当前的研究中,我们专注于不确定碱基对人类免疫缺陷病毒 1(HIV-1)患者治疗建议的影响。具体来说,我们根据共受体嗜性预测模型分析了基于进入抑制剂的治疗建议。我们比较了文献中使用的三种不同的错误处理策略,即(i)忽略,(ii)最坏情况假设,和(iii)使用多数投票进行反卷积。我们可以证明,对于每个序列中的两个或更多个不确定位置,通常不再可能进行可靠的预测。此外,不确定位置也起着至关重要的作用。因此,我们分析了现有测序技术(例如 Illumina MiSeq 或 PacBio)的错误概率分布,针对上述错误处理策略,结果表明在不存在系统错误的情况下,忽略策略优于其他策略。在其他情况下,应首选使用多数投票的反卷积策略。