Competence Centre on Health Technologies, Tartu, Estonia.
Institute of Biomedicine and Translational Medicine, Department of Biomedicine, University of Tartu, Tartu, Estonia.
PLoS One. 2019 Jul 8;14(7):e0209139. doi: 10.1371/journal.pone.0209139. eCollection 2019.
Non-invasive prenatal testing (NIPT) enables accurate detection of fetal chromosomal trisomies. The majority of publicly available computational methods for sequencing-based NIPT analyses rely on low-coverage whole-genome sequencing (WGS) data and are not applicable for targeted high-coverage sequencing data from cell-free DNA samples. Here, we present a novel computational framework for a targeted high-coverage sequencing-based NIPT analysis. The developed framework uses a hidden Markov model (HMM) in conjunction with a supplemental machine learning model, such as decision tree (DT) or support vector machine (SVM), to detect fetal trisomy and parental origin of additional fetal chromosomes. These models were developed using simulated datasets covering a wide range of biologically relevant scenarios with various chromosomal quantities, parental origins of extra chromosomes, fetal DNA fractions, and sequencing read depths. Developed models were tested on simulated and experimental targeted sequencing datasets. Consequently, we determined the functional feasibility and limitations of each proposed approach and demonstrated that read count-based HMM achieved the best overall classification accuracy of 0.89 for detecting fetal euploidies and trisomies on simulated dataset. Furthermore, we show that by using the DT and SVM on the HMM classification results, it was possible to increase the final trisomy classification accuracy to 0.98 and 0.99, respectively. We demonstrate that read count and allelic ratio-based models can achieve a high accuracy (up to 0.98) for detecting fetal trisomy even if the fetal fraction is as low as 2%. Currently, existing commercial NIPT analysis requires at least 4% of fetal fraction, which can be possibly a challenge in case of early gestational age (<10 weeks) or high maternal body mass index (>35 kg/m2). More accurate detection can be achieved at higher sequencing depth using HMM in conjunction with supplemental models, which significantly improve the trisomy detection especially in borderline scenarios (e.g., very low fetal fraction) and enables to perform NIPT even earlier than 10 weeks of pregnancy.
非侵入性产前检测(NIPT)可实现胎儿染色体三体的准确检测。大多数现有的基于测序的 NIPT 分析计算方法都依赖于低覆盖度全基因组测序(WGS)数据,并不适用于来自游离 DNA 样本的靶向高覆盖度测序数据。在这里,我们提出了一种新的基于靶向高覆盖度测序的 NIPT 分析计算框架。该开发框架使用隐马尔可夫模型(HMM)结合补充的机器学习模型(如决策树(DT)或支持向量机(SVM))来检测胎儿三体和额外胎儿染色体的亲本来源。这些模型是使用涵盖各种生物相关场景的模拟数据集开发的,包括不同的染色体数量、额外染色体的亲本来源、胎儿 DNA 分数和测序读取深度。开发的模型在模拟和实验靶向测序数据集上进行了测试。结果,我们确定了每种方法的功能可行性和局限性,并证明了基于读取计数的 HMM 在模拟数据集上检测胎儿整倍体和三体的整体分类准确率最高,达到 0.89。此外,我们还表明,通过在 HMM 分类结果上使用 DT 和 SVM,可以分别将最终的三体分类准确率提高到 0.98 和 0.99。我们证明,基于读取计数和等位基因比的模型即使在胎儿分数低至 2%的情况下也可以实现高达 0.98 的胎儿三体检测准确率。目前,现有的商业 NIPT 分析需要至少 4%的胎儿分数,这在孕早期(<10 周)或高母体体重指数(>35kg/m2)的情况下可能是一个挑战。使用 HMM 结合补充模型可以在更高的测序深度下实现更准确的检测,这显著提高了三体检测的准确性,特别是在边界情况(例如,非常低的胎儿分数)下,并且可以在怀孕 10 周之前更早地进行 NIPT。