• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种基于最短路径的方法,用于从下一代测序数据中检测拷贝数变异。

A shortest path-based approach for copy number variation detection from next-generation sequencing data.

作者信息

Liu Guojun, Yang Hongzhi, Yuan Xiguo

机构信息

School of Statistics, Xi'an University of Finance and Economics, Xi'an, China.

Medical Imaging Center, Xidian Group Hospital, Xi'an, China.

出版信息

Front Genet. 2023 Jan 17;13:1084974. doi: 10.3389/fgene.2022.1084974. eCollection 2022.

DOI:10.3389/fgene.2022.1084974
PMID:36733945
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9887524/
Abstract

Copy number variation (CNV) is one of the main structural variations in the human genome and accounts for a considerable proportion of variations. As CNVs can directly or indirectly cause cancer, mental illness, and genetic disease in humans, their effective detection in humans is of great interest in the fields of oncogene discovery, clinical decision-making, bioinformatics, and drug discovery. The advent of next-generation sequencing data makes CNV detection possible, and a large number of CNV detection tools are based on next-generation sequencing data. Due to the complexity (e.g., bias, noise, alignment errors) of next-generation sequencing data and CNV structures, the accuracy of existing methods in detecting CNVs remains low. In this work, we design a new CNV detection approach, called shortest path-based Copy number variation (SPCNV), to improve the detection accuracy of CNVs. SPCNV calculates the k nearest neighbors of each read depth and defines the shortest path, shortest path relation, and shortest path cost sets based on which further calculates the mean shortest path cost of each read depth and its k nearest neighbors. We utilize the ratio between the mean shortest path cost for each read depth and the mean of the mean shortest path cost of its k nearest neighbors to construct a relative shortest path score formula that is able to determine a score for each read depth. Based on the score profile, a boxplot is then applied to predict CNVs. The performance of the proposed method is verified by simulation data experiments and compared against several popular methods of the same type. Experimental results show that the proposed method achieves the best balance between recall and precision in each set of simulated samples. To further verify the performance of the proposed method in real application scenarios, we then select real sample data from the 1,000 Genomes Project to conduct experiments. The proposed method achieves the best F1-scores in almost all samples. Therefore, the proposed method can be used as a more reliable tool for the routine detection of CNVs.

摘要

拷贝数变异(CNV)是人类基因组中的主要结构变异之一,占变异的相当大比例。由于CNV可直接或间接导致人类患癌症、精神疾病和遗传疾病,因此在癌基因发现、临床决策、生物信息学和药物发现等领域,对其在人类中的有效检测具有极大的研究兴趣。下一代测序数据的出现使CNV检测成为可能,并且大量的CNV检测工具都是基于下一代测序数据的。由于下一代测序数据和CNV结构的复杂性(例如偏差、噪声、比对错误),现有方法检测CNV的准确性仍然较低。在这项工作中,我们设计了一种新的CNV检测方法,称为基于最短路径的拷贝数变异(SPCNV),以提高CNV的检测准确性。SPCNV计算每个读深度的k个最近邻,并定义最短路径、最短路径关系和最短路径成本集,在此基础上进一步计算每个读深度及其k个最近邻的平均最短路径成本。我们利用每个读深度的平均最短路径成本与其k个最近邻的平均最短路径成本的平均值之间的比率,构建一个相对最短路径得分公式,该公式能够为每个读深度确定一个得分。基于得分概况,然后应用箱线图来预测CNV。通过模拟数据实验验证了所提方法的性能,并与几种同类流行方法进行了比较。实验结果表明,所提方法在每组模拟样本中实现了召回率和精确率之间的最佳平衡。为了进一步验证所提方法在实际应用场景中的性能,我们随后从千人基因组计划中选择真实样本数据进行实验。所提方法在几乎所有样本中都取得了最佳的F1分数。因此,所提方法可作为一种更可靠的工具用于CNV的常规检测。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee34/9887524/067a4f13d783/fgene-13-1084974-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee34/9887524/0d6086fc9d0e/fgene-13-1084974-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee34/9887524/db9e74a37235/fgene-13-1084974-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee34/9887524/66c9253a347b/fgene-13-1084974-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee34/9887524/666e8e39baec/fgene-13-1084974-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee34/9887524/02128e13aca8/fgene-13-1084974-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee34/9887524/067a4f13d783/fgene-13-1084974-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee34/9887524/0d6086fc9d0e/fgene-13-1084974-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee34/9887524/db9e74a37235/fgene-13-1084974-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee34/9887524/66c9253a347b/fgene-13-1084974-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee34/9887524/666e8e39baec/fgene-13-1084974-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee34/9887524/02128e13aca8/fgene-13-1084974-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee34/9887524/067a4f13d783/fgene-13-1084974-g006.jpg

相似文献

1
A shortest path-based approach for copy number variation detection from next-generation sequencing data.一种基于最短路径的方法,用于从下一代测序数据中检测拷贝数变异。
Front Genet. 2023 Jan 17;13:1084974. doi: 10.3389/fgene.2022.1084974. eCollection 2022.
2
Detection of copy number variations based on a local distance using next-generation sequencing data.基于局部距离利用下一代测序数据检测拷贝数变异。
Front Genet. 2023 Sep 22;14:1147761. doi: 10.3389/fgene.2023.1147761. eCollection 2023.
3
Noise cancellation using total variation for copy number variation detection.利用全变差降噪进行拷贝数变异检测。
BMC Bioinformatics. 2018 Oct 22;19(Suppl 11):361. doi: 10.1186/s12859-018-2332-x.
4
A Cluster-Based Approach for the Discovery of Copy Number Variations From Next-Generation Sequencing Data.一种基于聚类的方法用于从下一代测序数据中发现拷贝数变异
Front Genet. 2021 Jun 28;12:699510. doi: 10.3389/fgene.2021.699510. eCollection 2021.
5
CNV-PCC: An efficient method for detecting copy number variations from next-generation sequencing data.CNV-PCC:一种从下一代测序数据中检测拷贝数变异的有效方法。
Front Bioeng Biotechnol. 2022 Dec 1;10:1000638. doi: 10.3389/fbioe.2022.1000638. eCollection 2022.
6
RKDOSCNV: A Local Kernel Density-Based Approach to the Detection of Copy Number Variations by Using Next-Generation Sequencing Data.RKDOSCNV:一种基于局部核密度的方法,用于利用下一代测序数据检测拷贝数变异。
Front Genet. 2020 Nov 4;11:569227. doi: 10.3389/fgene.2020.569227. eCollection 2020.
7
HBOS-CNV: A New Approach to Detect Copy Number Variations From Next-Generation Sequencing Data.HBOS-CNV:一种从下一代测序数据中检测拷贝数变异的新方法。
Front Genet. 2021 Jun 7;12:642473. doi: 10.3389/fgene.2021.642473. eCollection 2021.
8
CNV-MEANN: A Neural Network and Mind Evolutionary Algorithm-Based Detection of Copy Number Variations From Next-Generation Sequencing Data.CNV-MEANN:一种基于神经网络和思维进化算法的从下一代测序数据中检测拷贝数变异的方法
Front Genet. 2021 Aug 16;12:700874. doi: 10.3389/fgene.2021.700874. eCollection 2021.
9
CIRCNV: Detection of CNVs Based on a Circular Profile of Read Depth from Sequencing Data.CIRCNV:基于测序数据读深度的圆形分布检测拷贝数变异
Biology (Basel). 2021 Jun 25;10(7):584. doi: 10.3390/biology10070584.
10
Detecting copy number variation in next generation sequencing data from diagnostic gene panels.检测下一代测序数据中的拷贝数变异,来自诊断基因面板。
BMC Med Genomics. 2021 Aug 31;14(1):214. doi: 10.1186/s12920-021-01059-x.

引用本文的文献

1
Multi-tool copy number detection highlights common body size-associated variants in miniature pig breeds from different geographical regions.多工具拷贝数检测揭示了来自不同地理区域的小型猪品种中与体型相关的常见变异。
BMC Genomics. 2025 Mar 22;26(1):285. doi: 10.1186/s12864-025-11446-8.
2
A copy number variation detection method based on OCSVM algorithm using multi strategies integration.一种基于多策略集成的支持向量数据描述(OCSVM)算法的拷贝数变异检测方法。
Sci Rep. 2025 Jan 28;15(1):3526. doi: 10.1038/s41598-025-88143-9.
3
Genome-Wide Scan for Copy Number Variations in Chinese Merino Sheep Based on Ovine High-Density 600K SNP Arrays.

本文引用的文献

1
RKDOSCNV: A Local Kernel Density-Based Approach to the Detection of Copy Number Variations by Using Next-Generation Sequencing Data.RKDOSCNV:一种基于局部核密度的方法,用于利用下一代测序数据检测拷贝数变异。
Front Genet. 2020 Nov 4;11:569227. doi: 10.3389/fgene.2020.569227. eCollection 2020.
2
A Local Outlier Factor-Based Detection of Copy Number Variations From NGS Data.基于局部离群因子的 NGS 数据拷贝数变异检测。
IEEE/ACM Trans Comput Biol Bioinform. 2021 Sep-Oct;18(5):1811-1820. doi: 10.1109/TCBB.2019.2961886. Epub 2021 Oct 7.
3
CNV_IFTV: An Isolation Forest and Total Variation-Based Detection of CNVs from Short-Read Sequencing Data.
基于绵羊高密度600K SNP芯片的中国美利奴羊拷贝数变异全基因组扫描
Animals (Basel). 2024 Oct 8;14(19):2897. doi: 10.3390/ani14192897.
4
Detection of copy number variations based on a local distance using next-generation sequencing data.基于局部距离利用下一代测序数据检测拷贝数变异。
Front Genet. 2023 Sep 22;14:1147761. doi: 10.3389/fgene.2023.1147761. eCollection 2023.
CNV_IFTV:一种基于孤立森林和全变差的短读测序数据 CNV 检测方法。
IEEE/ACM Trans Comput Biol Bioinform. 2021 Mar-Apr;18(2):539-549. doi: 10.1109/TCBB.2019.2920889. Epub 2021 Apr 8.
4
CONDEL: Detecting Copy Number Variation and Genotyping Deletion Zygosity from Single Tumor Samples Using Sequence Data.CONDEL:利用序列数据从单个肿瘤样本中检测拷贝数变异和基因型缺失杂合性。
IEEE/ACM Trans Comput Biol Bioinform. 2020 Jul-Aug;17(4):1141-1153. doi: 10.1109/TCBB.2018.2883333. Epub 2018 Nov 26.
5
iCopyDAV: Integrated platform for copy number variations-Detection, annotation and visualization.iCopyDAV:用于拷贝数变异检测、注释和可视化的集成平台。
PLoS One. 2018 Apr 5;13(4):e0195334. doi: 10.1371/journal.pone.0195334. eCollection 2018.
6
Germline copy number variations are associated with breast cancer risk and prognosis.胚系拷贝数变异与乳腺癌风险和预后相关。
Sci Rep. 2017 Nov 7;7(1):14621. doi: 10.1038/s41598-017-14799-7.
7
SeqCNV: a novel method for identification of copy number variations in targeted next-generation sequencing data.SeqCNV:一种用于在靶向新一代测序数据中识别拷贝数变异的新方法。
BMC Bioinformatics. 2017 Mar 3;18(1):147. doi: 10.1186/s12859-017-1566-3.
8
Copy number analysis of whole-genome data using BIC-seq2 and its application to detection of cancer susceptibility variants.使用BIC-seq2对全基因组数据进行拷贝数分析及其在癌症易感性变异检测中的应用。
Nucleic Acids Res. 2016 Jul 27;44(13):6274-86. doi: 10.1093/nar/gkw491. Epub 2016 Jun 3.
9
IntSIM: An Integrated Simulator of Next-Generation Sequencing Data.IntSIM:下一代测序数据集成模拟器
IEEE Trans Biomed Eng. 2017 Feb;64(2):441-451. doi: 10.1109/TBME.2016.2560939. Epub 2016 Apr 29.
10
Multiple Imputation of Item Scores in Test and Questionnaire Data, and Influence on Psychometric Results.测试和问卷数据中项目得分的多重插补及其对心理测量结果的影响。
Multivariate Behav Res. 2007 Apr-Jun;42(2):387-414. doi: 10.1080/00273170701360803.