MAGI Euregio, Bolzano,
Eur Rev Med Pharmacol Sci. 2019 Sep;23(18):8139-8147. doi: 10.26355/eurrev_201909_19034.
While next generation sequencing (NGS) has become the technology of choice for clinical diagnostics, most genetic laboratories still use Sanger sequencing for orthogonal confirmation of NGS results. Previous studies have shown that when the quality of NGS data is high, most calls are indicated by Sanger sequencing, making confirmation redundant. We aimed at establishing a set of criteria that make it possible to distinguish NGS calls that need orthogonal confirmation from those that do not would significantly decrease the amount of work necessary to reach a diagnosis.
A data set of 7976 NGS calls confirmed as true or false positive by Sanger sequencing was used to train and test different machine learning (ML) approaches. By varying the size and class balance of the training dataset, we measured the performance of the different algorithms to determine the conditions under which ML is a valid approach for confirming NGS calls in a diagnostic environment.
Our results indicate that machine learning is a valid approach to find variant calls that need more investigation, but in order to reach the high accuracy required in a clinical environment, the training data set must include enough observations and these observations must be well-balanced between true/false positive NGS calls.
Our results show that it is possible to integrate the diagnostic NGS validation workflow with a machine learning approach to reduce the number of Sanger confirmations of high- quality NGS calls, reducing the time and costs of diagnosis.
虽然下一代测序(NGS)已成为临床诊断的首选技术,但大多数遗传实验室仍使用 Sanger 测序对 NGS 结果进行正交确认。先前的研究表明,当 NGS 数据质量较高时,Sanger 测序可指示大多数检测结果,从而使确认工作变得多余。我们旨在建立一套标准,使我们能够区分需要正交确认的 NGS 检测结果和不需要的检测结果,这将大大减少获得诊断所需的工作量。
使用一组由 Sanger 测序证实为阳性或阴性的 7976 个 NGS 检测结果的数据来训练和测试不同的机器学习(ML)方法。通过改变训练数据集的大小和类别平衡,我们测量了不同算法的性能,以确定在何种条件下,机器学习是一种在诊断环境中确认 NGS 检测结果的有效方法。
我们的结果表明,机器学习是一种有效的方法,可以找到需要进一步调查的变异检测结果,但为了达到临床环境所需的高精度,训练数据集必须包含足够的观测值,并且这些观测值必须在真阳性/假阳性 NGS 检测结果之间保持良好的平衡。
我们的结果表明,将诊断性 NGS 验证工作流程与机器学习方法相结合是可行的,可以减少高质量 NGS 检测结果的 Sanger 确认次数,从而缩短诊断时间并降低成本。