• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

CNVind:一个基于覆盖深度的全外显子测序数据中罕见 CNVs 检测的开源云端分析流程。

CNVind: an open source cloud-based pipeline for rare CNVs detection in whole exome sequencing data based on the depth of coverage.

机构信息

Warsaw University of Technology, Institute of Computer Science, Nowowiejska 15/19, 00-665, Warsaw, Poland.

出版信息

BMC Bioinformatics. 2022 Mar 5;23(1):85. doi: 10.1186/s12859-022-04617-x.

DOI:10.1186/s12859-022-04617-x
PMID:35247967
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8897915/
Abstract

BACKGROUND

A typical Copy Number Variations (CNVs) detection process based on the depth of coverage in the Whole Exome Sequencing (WES) data consists of several steps: (I) calculating the depth of coverage in sequencing regions, (II) quality control, (III) normalizing the depth of coverage, (IV) calling CNVs. Previous tools performed one normalization process for each chromosome-all the coverage depths in the sequencing regions from a given chromosome were normalized in a single run.

METHODS

Herein, we present the new CNVind tool for calling CNVs, where the normalization process is conducted separately for each of the sequencing regions. The total number of normalizations is equal to the number of sequencing regions in the investigated dataset. For example, when analyzing a dataset composed of n sequencing regions, CNVind performs n independent depth of coverage normalizations. Before each normalization, the application selects the k most correlated sequencing regions with the depth of coverage Pearson's Correlation as distance metric. Then, the resulting subgroup of [Formula: see text] sequencing regions is normalized, the results of all n independent normalizations are combined; finally, the segmentation and CNV calling process is performed on the resultant dataset.

RESULTS AND CONCLUSIONS

We used WES data from the 1000 Genomes project to evaluate the impact of independent normalization on CNV calling performance and compared the results with state-of-the-art tools: CODEX and exomeCopy. The results proved that independent normalization allows to improve the rare CNVs detection specificity significantly. For example, for the investigated dataset, we reduced the number of FP calls from over 15,000 to around 5000 while maintaining a constant number of TP calls equal to about 150 CNVs. However, independent normalization of each sequencing region is a computationally expensive process, therefore our pipeline is customized and can be easily run in the cloud computing environment, on the computer cluster, or the single CPU server. To our knowledge, the presented application is the first attempt to implement an innovative approach to independent normalization of the depth of WES data coverage.

摘要

背景

基于全外显子测序(WES)数据覆盖深度的典型拷贝数变异(CNVs)检测过程包括以下几个步骤:(I)计算测序区域的覆盖深度,(II)质量控制,(III)覆盖深度标准化,(IV)CNVs 调用。以前的工具对每条染色体执行一个标准化过程-给定染色体的测序区域中的所有覆盖深度都在单个运行中进行标准化。

方法

本文介绍了用于调用 CNVs 的新 CNVind 工具,其中标准化过程分别针对每个测序区域进行。标准化的总数等于研究数据集的测序区域数。例如,在分析由 n 个测序区域组成的数据集时,CNVind 执行 n 个独立的覆盖深度标准化。在每次标准化之前,应用程序使用皮尔逊相关系数作为距离度量,选择与覆盖深度最相关的 k 个测序区域。然后,将结果的子组[公式:见文本]测序区域进行标准化,对所有 n 个独立的标准化结果进行组合;最后,对组合数据集执行分割和 CNV 调用过程。

结果与结论

我们使用 1000 基因组计划的 WES 数据来评估独立标准化对 CNV 调用性能的影响,并将结果与最先进的工具:CODEX 和 exomeCopy 进行比较。结果证明,独立标准化可以显著提高罕见 CNVs 的检测特异性。例如,对于研究数据集,我们将 FP 调用数量从 15000 多个减少到 5000 个左右,同时保持 TP 调用数量不变,约为 150 个 CNVs。然而,每个测序区域的独立标准化是一个计算成本很高的过程,因此我们的管道是定制的,可以在云计算环境、计算机群或单个 CPU 服务器上轻松运行。据我们所知,所提出的应用程序是第一个尝试实施全外显子测序数据覆盖深度独立标准化的创新方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2bba/8897915/d60a133dcc2c/12859_2022_4617_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2bba/8897915/b1de51067f3d/12859_2022_4617_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2bba/8897915/e7136c33334a/12859_2022_4617_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2bba/8897915/197f087f676f/12859_2022_4617_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2bba/8897915/2e8b3d8d8984/12859_2022_4617_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2bba/8897915/e585fa333216/12859_2022_4617_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2bba/8897915/cb552a5a9d98/12859_2022_4617_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2bba/8897915/d60a133dcc2c/12859_2022_4617_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2bba/8897915/b1de51067f3d/12859_2022_4617_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2bba/8897915/e7136c33334a/12859_2022_4617_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2bba/8897915/197f087f676f/12859_2022_4617_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2bba/8897915/2e8b3d8d8984/12859_2022_4617_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2bba/8897915/e585fa333216/12859_2022_4617_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2bba/8897915/cb552a5a9d98/12859_2022_4617_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2bba/8897915/d60a133dcc2c/12859_2022_4617_Fig7_HTML.jpg

相似文献

1
CNVind: an open source cloud-based pipeline for rare CNVs detection in whole exome sequencing data based on the depth of coverage.CNVind:一个基于覆盖深度的全外显子测序数据中罕见 CNVs 检测的开源云端分析流程。
BMC Bioinformatics. 2022 Mar 5;23(1):85. doi: 10.1186/s12859-022-04617-x.
2
Different Strategies for Counting the Depth of Coverage in Copy Number Variation Calling Tools.拷贝数变异检测工具中计算覆盖深度的不同策略。
Bioinform Biol Insights. 2022 Aug 3;16:11779322221115534. doi: 10.1177/11779322221115534. eCollection 2022.
3
An evaluation of copy number variation detection tools for cancer using whole exome sequencing data.使用全外显子组测序数据对癌症拷贝数变异检测工具的评估
BMC Bioinformatics. 2017 May 31;18(1):286. doi: 10.1186/s12859-017-1705-x.
4
Comparison of kNN and k-means optimization methods of reference set selection for improved CNV callers performance.比较用于提高 CNV 调用程序性能的参考集选择的 kNN 和 k-means 优化方法。
BMC Bioinformatics. 2019 May 28;20(1):266. doi: 10.1186/s12859-019-2889-z.
5
CODEX: a normalization and copy number variation detection method for whole exome sequencing.CODEX:一种用于全外显子组测序的标准化及拷贝数变异检测方法。
Nucleic Acids Res. 2015 Mar 31;43(6):e39. doi: 10.1093/nar/gku1363. Epub 2015 Jan 23.
6
Evaluation of somatic copy number estimation tools for whole-exome sequencing data.全外显子组测序数据的体细胞拷贝数估计工具评估
Brief Bioinform. 2016 Mar;17(2):185-92. doi: 10.1093/bib/bbv055. Epub 2015 Jul 25.
7
Accurate in silico confirmation of rare copy number variant calls from exome sequencing data using transfer learning.利用迁移学习准确地从外显子组测序数据中确认罕见拷贝数变异的调用。
Nucleic Acids Res. 2022 Nov 28;50(21):e123. doi: 10.1093/nar/gkac788.
8
Detection of clinically relevant copy number variants with whole-exome sequencing.全外显子测序检测临床相关拷贝数变异。
Hum Mutat. 2013 Oct;34(10):1439-48. doi: 10.1002/humu.22387. Epub 2013 Aug 30.
9
An evaluation of copy number variation detection tools from whole-exome sequencing data.基于全外显子组测序数据的拷贝数变异检测工具评估
Hum Mutat. 2014 Jul;35(7):899-907. doi: 10.1002/humu.22537. Epub 2014 May 1.
10
Exome sequence read depth methods for identifying copy number changes.用于识别拷贝数变化的外显子序列读取深度方法。
Brief Bioinform. 2015 May;16(3):380-92. doi: 10.1093/bib/bbu027. Epub 2014 Aug 28.

引用本文的文献

1
The Role of Genetic Testing in Adult CKD.遗传检测在成人慢性肾脏病中的作用。
J Am Soc Nephrol. 2024 Aug 1;35(8):1107-1118. doi: 10.1681/ASN.0000000000000401. Epub 2024 May 6.
2
Genetic interrogation for sequence and copy number variants in systemic lupus erythematosus.系统性红斑狼疮中序列和拷贝数变异的基因检测
Front Genet. 2024 Mar 4;15:1341272. doi: 10.3389/fgene.2024.1341272. eCollection 2024.
3
In Copy Number Variation (CNVs) Bioinformatics Estimation: Dream or Nightmare?在拷贝数变异(CNVs)生物信息学评估中:梦想还是噩梦?

本文引用的文献

1
Evaluation of CNV detection tools for NGS panel data in genetic diagnostics.评估用于遗传诊断中 NGS 面板数据的 CNV 检测工具。
Eur J Hum Genet. 2020 Dec;28(12):1645-1655. doi: 10.1038/s41431-020-0675-z. Epub 2020 Jun 19.
2
SECNVs: A Simulator of Copy Number Variants and Whole-Exome Sequences From Reference Genomes.SECNVs:一种用于从参考基因组生成拷贝数变异和全外显子组序列的模拟器。
Front Genet. 2020 Feb 21;11:82. doi: 10.3389/fgene.2020.00082. eCollection 2020.
3
Comparative study of whole exome sequencing-based copy number variation detection tools.
EJIFCC. 2023 Apr 18;34(1):72-75. eCollection 2023 Apr.
4
Different Strategies for Counting the Depth of Coverage in Copy Number Variation Calling Tools.拷贝数变异检测工具中计算覆盖深度的不同策略。
Bioinform Biol Insights. 2022 Aug 3;16:11779322221115534. doi: 10.1177/11779322221115534. eCollection 2022.
基于全外显子组测序的拷贝数变异检测工具的比较研究。
BMC Bioinformatics. 2020 Mar 5;21(1):97. doi: 10.1186/s12859-020-3421-1.
4
Rare copy number variants in over 100,000 European ancestry subjects reveal multiple disease associations.在超过 10 万欧洲血统个体中罕见的拷贝数变异揭示了多种疾病的关联。
Nat Commun. 2020 Jan 14;11(1):255. doi: 10.1038/s41467-019-13624-1.
5
Comparison of kNN and k-means optimization methods of reference set selection for improved CNV callers performance.比较用于提高 CNV 调用程序性能的参考集选择的 kNN 和 k-means 优化方法。
BMC Bioinformatics. 2019 May 28;20(1):266. doi: 10.1186/s12859-019-2889-z.
6
CODEX2: full-spectrum copy number variation detection by high-throughput DNA sequencing.CODEX2:通过高通量 DNA 测序进行全谱拷贝数变异检测。
Genome Biol. 2018 Nov 26;19(1):202. doi: 10.1186/s13059-018-1578-y.
7
SeQuiLa: an elastic, fast and scalable SQL-oriented solution for processing and querying genomic intervals.SeQuiLa:一种面向 SQL 的弹性、快速和可扩展的解决方案,用于处理和查询基因组区间。
Bioinformatics. 2019 Jun 1;35(12):2156-2158. doi: 10.1093/bioinformatics/bty940.
8
Ximmer: a system for improving accuracy and consistency of CNV calling from exome data.Ximmer:一种提高外显子数据中 CNV 调用准确性和一致性的系统。
Gigascience. 2018 Oct 1;7(10):giy112. doi: 10.1093/gigascience/giy112.
9
Bamgineer: Introduction of simulated allele-specific copy number variants into exome and targeted sequence data sets.Bamgineer:外显子组和靶向序列数据集模拟等位基因特异性拷贝数变异的引入。
PLoS Comput Biol. 2018 Mar 28;14(3):e1006080. doi: 10.1371/journal.pcbi.1006080. eCollection 2018 Mar.
10
Mosdepth: quick coverage calculation for genomes and exomes.Mosdepth:基因组和外显子组的快速覆盖度计算。
Bioinformatics. 2018 Mar 1;34(5):867-868. doi: 10.1093/bioinformatics/btx699.