Liu Guojun, Zhang Junying, Yuan Xiguo, Wei Chao
School of Computer Science and Technology, Xidian University, Xi'an, China.
Front Genet. 2020 Nov 4;11:569227. doi: 10.3389/fgene.2020.569227. eCollection 2020.
Copy number variations (CNVs) are significant causes of many human cancers and genetic diseases. The detection of CNVs has become a common method by which to analyze human diseases using next-generation sequencing (NGS) data. However, effective detection of insignificant CNVs is still a challenging task. In this study, we propose a new detection method, RKDOSCNV, to meet the need. RKDOSCNV uses kernel density estimation method to evaluate the local kernel density distribution of each read depth segment (RDS) based on an expanded nearest neighbor (k-nearest neighbors, reverse nearest neighbors, and shared nearest neighbors of each RDS) data set, and assigns a relative kernel density outlier score (RKDOS) for each RDS. According to the RKDOS profile, RKDOSCNV predicts the candidate CNVs by choosing a reasonable threshold, which it uses split read approach to correct the boundaries of candidate CNVs. The performance of RKDOSCNV is assessed by comparing it with several current popular methods via experiments with simulated and real data at different tumor purity levels. The experimental results verify that the performance of RKDOSCNV is superior to that of several other methods. In summary, RKDOSCNV is a simple and effective method for the detection of CNVs from whole genome sequencing (WGS) data, especially for samples with low tumor purity.
拷贝数变异(CNV)是许多人类癌症和遗传疾病的重要成因。利用下一代测序(NGS)数据检测CNV已成为分析人类疾病的常用方法。然而,有效检测无显著意义的CNV仍是一项具有挑战性的任务。在本研究中,我们提出了一种新的检测方法RKDOSCNV来满足这一需求。RKDOSCNV基于每个读段深度片段(RDS)的扩展最近邻(每个RDS的k近邻、反向近邻和共享近邻)数据集,使用核密度估计方法来评估每个RDS的局部核密度分布,并为每个RDS分配一个相对核密度异常值分数(RKDOS)。根据RKDOS概况,RKDOSCNV通过选择一个合理的阈值来预测候选CNV,并使用分裂读方法校正候选CNV的边界。通过在不同肿瘤纯度水平下使用模拟数据和真实数据进行实验,将RKDOSCNV与几种当前流行的方法进行比较,评估了其性能。实验结果验证了RKDOSCNV的性能优于其他几种方法。总之,RKDOSCNV是一种从全基因组测序(WGS)数据中检测CNV的简单有效的方法,尤其适用于肿瘤纯度低的样本。