School of Computer Science and Technology, Liaocheng University, Liaocheng, China.
School of Computer Science and Technology, Liaocheng University, Liaocheng, China; School of Information Engineering, HengXing University, Qingdao, China.
Comput Biol Chem. 2022 Aug;99:107720. doi: 10.1016/j.compbiolchem.2022.107720. Epub 2022 Jun 25.
Copy number variation (CNV) is a non-negligible structural variation on the genome. And next-generation sequencing (NGS) technology is widely used to detect CNVs due to the feature of high throughput and low cost on the whole genome. Based on the original MFCNV method, this paper proposes an improved CNV detection method, which is called CNVABNN. In comparison to the MFCNV method, CNVABNN has three advantages: (1) It adds detectable categories, and refines the categories of loss into hemi_loss and homo_loss. (2) It utilizes the idea of integrated learning. The AdaBoost algorithm is used as the core framework and neural networks are used as weak classifiers, then CNVABNN combines all of the weak classifiers into a strong classifier. The overall performance of CNV detection is improved by using the strong classifier. (3) The detection is optimized by predicting CNVs twice through neural networks and voting mechanisms. To evaluate the performance of CNVABNN, six existing detection methods are used for comparison. The experimental results show that CNVABNN achieves better results in terms of precision, sensitivity, and F1-score for both simulated and real samples.
拷贝数变异 (CNV) 是基因组上一种不可忽视的结构性变异。由于高通量和低成本的整体基因组特性,下一代测序 (NGS) 技术被广泛用于检测 CNV。基于原始的 MFCNV 方法,本文提出了一种改进的 CNV 检测方法,称为 CNVABNN。与 MFCNV 方法相比,CNVABNN 具有三个优势:(1) 它增加了可检测的类别,并将缺失类别细化为半缺失和纯缺失。(2) 它利用了集成学习的思想。AdaBoost 算法被用作核心框架,神经网络被用作弱分类器,然后 CNVABNN 将所有的弱分类器组合成一个强分类器。通过使用强分类器,整体 CNV 检测性能得到了提高。(3) 通过神经网络和投票机制两次预测 CNV 来优化检测。为了评估 CNVABNN 的性能,使用了六种现有的检测方法进行比较。实验结果表明,CNVABNN 在模拟和真实样本的精度、灵敏度和 F1 分数方面都取得了更好的结果。