School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an 710049, China.
Shaanxi Engineering Research Center of Medical and Health Big Data, Xi'an Jiaotong University, Xi'an 710049, China.
Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae266.
Non-invasive prenatal testing (NIPT) is a quite popular approach for detecting fetal genomic aneuploidies. However, due to the limitations on sequencing read length and coverage, NIPT suffers a bottleneck on further improving performance and conducting earlier detection. The errors mainly come from reference biases and population polymorphism. To break this bottleneck, we proposed NIPT-PG, which enables the NIPT algorithm to learn from population data. A pan-genome model is introduced to incorporate variant and polymorphic loci information from tested population. Subsequently, we proposed a sequence-to-graph alignment method, which considers the read mis-match rates during the mapping process, and an indexing method using hash indexing and adjacency lists to accelerate the read alignment process. Finally, by integrating multi-source aligned read and polymorphic sites across the pan-genome, NIPT-PG obtains a more accurate z-score, thereby improving the accuracy of chromosomal aneuploidy detection. We tested NIPT-PG on two simulated datasets and 745 real-world cell-free DNA sequencing data sets from pregnant women. Results demonstrate that NIPT-PG outperforms the standard z-score test. Furthermore, combining experimental and theoretical analyses, we demonstrate the probably approximately correct learnability of NIPT-PG. In summary, NIPT-PG provides a new perspective for fetal chromosomal aneuploidies detection. NIPT-PG may have broad applications in clinical testing, and its detection results can serve as a reference for false positive samples approaching the critical threshold.
非侵入性产前检测(NIPT)是一种检测胎儿基因组非整倍体的常用方法。然而,由于测序读长和覆盖度的限制,NIPT 在进一步提高性能和进行早期检测方面存在瓶颈。这些错误主要来自参考偏倚和群体多态性。为了打破这一瓶颈,我们提出了 NIPT-PG,使 NIPT 算法能够从群体数据中学习。引入泛基因组模型来整合来自测试人群的变异和多态性位点信息。随后,我们提出了一种序列到图的对齐方法,该方法考虑了在映射过程中的读错配率,以及一种使用哈希索引和邻接列表的索引方法,以加速读对齐过程。最后,通过整合多源对齐的读取和泛基因组中的多态性位点,NIPT-PG 获得更准确的 Z 分数,从而提高染色体非整倍体检测的准确性。我们在两个模拟数据集和 745 个来自孕妇的真实无细胞 DNA 测序数据集上测试了 NIPT-PG。结果表明,NIPT-PG 优于标准 Z 分数测试。此外,通过实验和理论分析相结合,我们证明了 NIPT-PG 的可能近似正确学习性。总之,NIPT-PG 为胎儿染色体非整倍体检测提供了新的视角。NIPT-PG 可能在临床检测中有广泛的应用,其检测结果可以作为接近临界阈值的假阳性样本的参考。