Suppr超能文献

CNV-PCC:一种从下一代测序数据中检测拷贝数变异的有效方法。

CNV-PCC: An efficient method for detecting copy number variations from next-generation sequencing data.

作者信息

Zhang Tong, Dong Jinxin, Jiang Hua, Zhao Zuyao, Zhou Mengjiao, Yuan Tianting

机构信息

School of Computer Science and Technology, Liaocheng University, Liaocheng, China.

College of Clinical Medicine, Shandong First Medical University and Shandong Academy of Medical Sciences, Jinan, China.

出版信息

Front Bioeng Biotechnol. 2022 Dec 1;10:1000638. doi: 10.3389/fbioe.2022.1000638. eCollection 2022.

Abstract

Copy number variations (CNVs) significantly influence the diversity of the human genome and the occurrence of many complex diseases. The next-generation sequencing (NGS) technology provides rich data for detecting CNVs, and the read depth (RD)-based approach is widely used. However, low CN (copy number of 3-4) duplication events are challenging to identify with existing methods, especially when the size of CNVs is small. In addition, the RD-based approach can only obtain rough breakpoints. We propose a new method, CNV-PCC (detection of CNVs based on Principal Component Classifier), to identify CNVs in whole genome sequencing data. CNV-PPC first uses the split read signal to search for potential breakpoints. A two-stage segmentation strategy is then implemented to enhance the identification capabilities of low CN duplications and small CNVs. Next, the outlier scores are calculated for each segment by PCC (Principal Component Classifier). Finally, the OTSU algorithm calculates the threshold to determine the CNVs regions. The analysis of simulated data results indicates that CNV-PCC outperforms the other methods for sensitivity and F1-score and improves breakpoint accuracy. Furthermore, CNV-PCC shows high consistency on real sequencing samples with other methods. This study demonstrates that CNV-PCC is an effective method for detecting CNVs, even for low CN duplications and small CNVs.

摘要

拷贝数变异(CNVs)显著影响人类基因组的多样性以及许多复杂疾病的发生。下一代测序(NGS)技术为检测CNVs提供了丰富的数据,基于读深度(RD)的方法被广泛使用。然而,低拷贝数(拷贝数为3 - 4)的重复事件使用现有方法难以识别,特别是当CNVs的大小较小时。此外,基于RD的方法只能获得大致的断点。我们提出了一种新的方法,即CNV - PCC(基于主成分分类器的CNVs检测方法),用于在全基因组测序数据中识别CNVs。CNV - PPC首先利用分裂读信号搜索潜在断点。然后实施两阶段分割策略以增强对低拷贝数重复和小CNVs的识别能力。接下来,通过主成分分类器(PCC)为每个片段计算异常值分数。最后,使用大津算法计算阈值以确定CNVs区域。模拟数据结果分析表明,CNV - PCC在灵敏度和F1分数方面优于其他方法,并提高了断点准确性。此外,CNV - PCC与其他方法在真实测序样本上具有高度一致性。这项研究表明,CNV - PCC是一种检测CNVs的有效方法,即使对于低拷贝数重复和小CNVs也是如此。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ff3/9751350/83f1ca23d2cc/fbioe-10-1000638-g001.jpg

相似文献

1
CNV-PCC: An efficient method for detecting copy number variations from next-generation sequencing data.
Front Bioeng Biotechnol. 2022 Dec 1;10:1000638. doi: 10.3389/fbioe.2022.1000638. eCollection 2022.
2
Noise cancellation using total variation for copy number variation detection.
BMC Bioinformatics. 2018 Oct 22;19(Suppl 11):361. doi: 10.1186/s12859-018-2332-x.
3
HBOS-CNV: A New Approach to Detect Copy Number Variations From Next-Generation Sequencing Data.
Front Genet. 2021 Jun 7;12:642473. doi: 10.3389/fgene.2021.642473. eCollection 2021.
5
A Local Outlier Factor-Based Detection of Copy Number Variations From NGS Data.
IEEE/ACM Trans Comput Biol Bioinform. 2021 Sep-Oct;18(5):1811-1820. doi: 10.1109/TCBB.2019.2961886. Epub 2021 Oct 7.
7
A shortest path-based approach for copy number variation detection from next-generation sequencing data.
Front Genet. 2023 Jan 17;13:1084974. doi: 10.3389/fgene.2022.1084974. eCollection 2022.
8
GROM-RD: resolving genomic biases to improve read depth detection of copy number variants.
PeerJ. 2015 Mar 17;3:e836. doi: 10.7717/peerj.836. eCollection 2015.
9
Evaluation of copy number variant detection from panel-based next-generation sequencing data.
Mol Genet Genomic Med. 2019 Jan;7(1):e00513. doi: 10.1002/mgg3.513. Epub 2018 Nov 22.
10
A Density Peak-Based Method to Detect Copy Number Variations From Next-Generation Sequencing Data.
Front Genet. 2021 Jan 13;11:632311. doi: 10.3389/fgene.2020.632311. eCollection 2020.

引用本文的文献

本文引用的文献

1
DNA copy number variation: Main characteristics, evolutionary significance, and pathological aspects.
Biomed J. 2021 Oct;44(5):548-559. doi: 10.1016/j.bj.2021.02.003. Epub 2021 Feb 13.
2
Use of mouse models to investigate the contributions of CNVs associated with schizophrenia and autism to disease mechanisms.
Curr Opin Genet Dev. 2021 Jun;68:99-105. doi: 10.1016/j.gde.2021.03.004. Epub 2021 May 3.
3
DINTD: Detection and Inference of Tandem Duplications From Short Sequencing Reads.
Front Genet. 2020 Aug 11;11:924. doi: 10.3389/fgene.2020.00924. eCollection 2020.
4
MFCNV: A New Method to Detect Copy Number Variations From Next-Generation Sequencing Data.
Front Genet. 2020 May 15;11:434. doi: 10.3389/fgene.2020.00434. eCollection 2020.
5
The contribution of CNVs to the most common aging-related neurodegenerative diseases.
Aging Clin Exp Res. 2021 May;33(5):1187-1195. doi: 10.1007/s40520-020-01485-4. Epub 2020 Feb 6.
6
Patterns of somatic structural variation in human cancer genomes.
Nature. 2020 Feb;578(7793):112-121. doi: 10.1038/s41586-019-1913-9. Epub 2020 Feb 5.
7
A Local Outlier Factor-Based Detection of Copy Number Variations From NGS Data.
IEEE/ACM Trans Comput Biol Bioinform. 2021 Sep-Oct;18(5):1811-1820. doi: 10.1109/TCBB.2019.2961886. Epub 2021 Oct 7.
8
Structural variation in the sequencing era.
Nat Rev Genet. 2020 Mar;21(3):171-189. doi: 10.1038/s41576-019-0180-9. Epub 2019 Nov 15.
9
CNV_IFTV: An Isolation Forest and Total Variation-Based Detection of CNVs from Short-Read Sequencing Data.
IEEE/ACM Trans Comput Biol Bioinform. 2021 Mar-Apr;18(2):539-549. doi: 10.1109/TCBB.2019.2920889. Epub 2021 Apr 8.
10
iCopyDAV: Integrated platform for copy number variations-Detection, annotation and visualization.
PLoS One. 2018 Apr 5;13(4):e0195334. doi: 10.1371/journal.pone.0195334. eCollection 2018.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验