评估外显子拷贝数变异预测的可重复性。

Assessing the reproducibility of exome copy number variations predictions.

作者信息

Hong Celine S, Singh Larry N, Mullikin James C, Biesecker Leslie G

机构信息

Medical Genomics and Metabolic Genetics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA.

NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20852, USA.

出版信息

Genome Med. 2016 Aug 8;8(1):82. doi: 10.1186/s13073-016-0336-6.

DOI:10.1186/s13073-016-0336-6

PMID:27503473

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4976506/

Abstract

BACKGROUND

Reproducibility is receiving increased attention across many domains of science and genomics is no exception. Efforts to identify copy number variations (CNVs) from exome sequence (ES) data have been increasing. Many algorithms have been published to discover CNVs from exomes and a major challenge is the reproducibility in other datasets. Here we test exome CNV calling reproducibility under three conditions: data generated by different sequencing centers; varying sample sizes; and varying capture methodology.

METHODS

Four CNV tools were tested: eXome Hidden Markov Model (XHMM), Copy Number Inference From Exome Reads (CoNIFER), EXCAVATOR, and Copy Number Analysis for Targeted Resequencing (CONTRA). To examine the reproducibility, we ran the callers on four datasets, varying sample sizes of N = 10, 30, 75, 100, 300, and data with different capture methodology. We examined the false negative (FN) calls and false positive (FP) calls for potential limitations of the CNV callers. The positive predictive value (PPV) was measured by checking the CNV call concordance against single nucleotide polymorphism array.

RESULTS

Using independently generated datasets, we examined the PPV for each dataset and observed wide range of PPVs. The PPV values were highly data dependent (p <0.001). For the sample sizes and capture method analyses, we tested the callers in triplicates. Both analyses resulted in wide ranges of PPVs, even for the same test. Interestingly, negative correlations between the PPV and the sample sizes were observed for CoNIFER (ρ = -0.80). Further examination of FN calls showed that 44 % of these were missed by all callers and were attributed to the CNV size (46 % spanned ≤3 exons). Overlap of the FP calls showed that FPs were unique to each caller, indicative of algorithm dependency.

CONCLUSIONS

Our results demonstrate that further improvements in CNV callers are necessary to improve reproducibility and to include wider spectrum of CNVs (including the small CNVs). These CNV callers should be evaluated on multiple independent, heterogeneously generated datasets of varying size to increase robustness and utility. These approaches to the evaluation of exome CNV are essential to support wide utility and applicability of CNV discovery in exome studies.

摘要

背景

可重复性在许多科学领域正受到越来越多的关注，基因组学也不例外。从外显子组序列（ES）数据中识别拷贝数变异（CNV）的研究越来越多。已经发表了许多用于从外显子组中发现CNV的算法，而一个主要挑战是在其他数据集中的可重复性。在此，我们在三种条件下测试外显子组CNV检测的可重复性：不同测序中心生成的数据；不同的样本量；以及不同的捕获方法。

方法

测试了四种CNV工具：外显子组隐马尔可夫模型（XHMM）、从外显子组读数推断拷贝数（CoNIFER）、挖掘器（EXCAVATOR）和靶向重测序的拷贝数分析（CONTRA）。为了检验可重复性，我们在四个数据集上运行这些检测工具，样本量分别为N = 10、30、75、100、300，以及采用不同捕获方法的数据。我们检查了假阴性（FN）检测和假阳性（FP）检测，以了解CNV检测工具的潜在局限性。通过将CNV检测结果与单核苷酸多态性阵列进行比对来测量阳性预测值（PPV）。

结果

使用独立生成的数据集，我们检查了每个数据集的PPV，并观察到PPV的范围很广。PPV值高度依赖于数据（p <0.001）。对于样本量和捕获方法分析，我们对检测工具进行了三次重复测试。即使对于相同的测试，这两种分析都导致了PPV的广泛范围。有趣的是，对于CoNIFER，观察到PPV与样本量之间存在负相关（ρ = -0.80）。对FN检测的进一步检查表明，所有检测工具都遗漏了其中百分之44的检测结果，这归因于CNV的大小（百分之46跨越≤3个外显子）。FP检测的重叠显示，FP对于每个检测工具都是独特的，这表明算法依赖性。

结论

我们的结果表明，需要进一步改进CNV检测工具，以提高可重复性并纳入更广泛的CNV谱（包括小CNV）。这些CNV检测工具应在多个独立的、异质生成的不同大小数据集上进行评估，以提高稳健性和实用性。这些评估外显子组CNV的方法对于支持CNV发现在外显子组研究中的广泛应用和适用性至关重要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b60e/4976506/b2f423f1aac2/13073_2016_336_Fig1_HTML.jpg

相似文献

Assessing the reproducibility of exome copy number variations predictions.评估外显子拷贝数变异预测的可重复性。

Genome Med. 2016 Aug 8;8(1):82. doi: 10.1186/s13073-016-0336-6.

Evaluation of somatic copy number estimation tools for whole-exome sequencing data.全外显子组测序数据的体细胞拷贝数估计工具评估

Brief Bioinform. 2016 Mar;17(2):185-92. doi: 10.1093/bib/bbv055. Epub 2015 Jul 25.

Exome sequence read depth methods for identifying copy number changes.用于识别拷贝数变化的外显子序列读取深度方法。

Brief Bioinform. 2015 May;16(3):380-92. doi: 10.1093/bib/bbu027. Epub 2014 Aug 28.

cnvScan: a CNV screening and annotation tool to improve the clinical utility of computational CNV prediction from exome sequencing data.cnvScan：一种用于提高外显子组测序数据计算性拷贝数变异（CNV）预测临床实用性的CNV筛选与注释工具。

BMC Genomics. 2016 Jan 14;17:51. doi: 10.1186/s12864-016-2374-2.

A machine-learning approach for accurate detection of copy number variants from exome sequencing.一种基于机器学习的方法，用于从外显子测序中准确检测拷贝数变异。

Genome Res. 2019 Jul;29(7):1134-1143. doi: 10.1101/gr.245928.118. Epub 2019 Jun 6.

Identification of copy number variants from exome sequence data.从外显子序列数据中识别拷贝数变异

BMC Genomics. 2014 Aug 7;15(1):661. doi: 10.1186/1471-2164-15-661.

An evaluation of copy number variation detection tools for cancer using whole exome sequencing data.使用全外显子组测序数据对癌症拷贝数变异检测工具的评估

BMC Bioinformatics. 2017 May 31;18(1):286. doi: 10.1186/s12859-017-1705-x.

An evaluation of copy number variation detection tools from whole-exome sequencing data.基于全外显子组测序数据的拷贝数变异检测工具评估

Hum Mutat. 2014 Jul;35(7):899-907. doi: 10.1002/humu.22537. Epub 2014 May 1.

Outlier-based identification of copy number variations using targeted resequencing in a small cohort of patients with Tetralogy of Fallot.基于离群值的拷贝数变异识别，在一小部分法洛四联症患者中使用靶向重测序。

PLoS One. 2014 Jan 6;9(1):e85375. doi: 10.1371/journal.pone.0085375. eCollection 2014.

Detecting copy-number variations in whole-exome sequencing data using the eXome Hidden Markov Model: an 'exome-first' approach.使用外显子隐马尔可夫模型检测全外显子测序数据中的拷贝数变异：一种“外显子优先”方法。

J Hum Genet. 2015 Apr;60(4):175-82. doi: 10.1038/jhg.2014.124. Epub 2015 Jan 22.

引用本文的文献

Diagnostic Utility of Trio-Exome Sequencing for Children With Neurodevelopmental Disorders.三联外显子测序对神经发育障碍儿童的诊断效用

JAMA Netw Open. 2025 Mar 3;8(3):e251807. doi: 10.1001/jamanetworkopen.2025.1807.

Evaluative Methodology for HRD Testing: Development of Standard Tools for Consistency Assessment.人力资源开发测试的评估方法：用于一致性评估的标准工具的开发。

Genomics Proteomics Bioinformatics. 2025 May 10;23(1). doi: 10.1093/gpbjnl/qzaf017.

Biallelic structural variants in three patients with ERCC8-related Cockayne syndrome and a potential pitfall of copy number variation analysis.三位 ERCC8 相关 Cockayne 综合征患者的双等位基因结构变异和拷贝数变异分析的一个潜在陷阱。

Sci Rep. 2024 Aug 26;14(1):19741. doi: 10.1038/s41598-024-70831-7.

In Copy Number Variation (CNVs) Bioinformatics Estimation: Dream or Nightmare?在拷贝数变异（CNVs）生物信息学评估中：梦想还是噩梦？

EJIFCC. 2023 Apr 18;34(1):72-75. eCollection 2023 Apr.

Algorithmic improvements for discovery of germline copy number variants in next-generation sequencing data.下一代测序数据中胚系拷贝数变异的发现算法改进。

BMC Bioinformatics. 2022 Jul 19;23(1):285. doi: 10.1186/s12859-022-04820-w.

Comprehensive analysis of recessive carrier status using exome and genome sequencing data in 1543 Southern Chinese.利用外显子组和基因组测序数据对1543名中国南方人群的隐性携带者状态进行综合分析。

NPJ Genom Med. 2022 Mar 21;7(1):23. doi: 10.1038/s41525-022-00287-z.

Clinical exome sequencing-Mistakes and caveats.临床外显子组测序——错误与注意事项。

Hum Mutat. 2022 Aug;43(8):1041-1055. doi: 10.1002/humu.24360. Epub 2022 Mar 15.

VarGenius-HZD Allows Accurate Detection of Rare Homozygous or Hemizygous Deletions in Targeted Sequencing Leveraging Breadth of Coverage.VarGenius-HZD 利用覆盖度优势，实现靶向测序中罕见纯合子或半合子缺失的准确检测。

Genes (Basel). 2021 Dec 13;12(12):1979. doi: 10.3390/genes12121979.

Immune pathways and TP53 missense mutations are associated with longer survival in canine osteosarcoma.免疫途径和 TP53 错义突变与犬骨肉瘤的更长生存时间相关。

Commun Biol. 2021 Oct 11;4(1):1178. doi: 10.1038/s42003-021-02683-0.

Incorporating Machine Learning into Established Bioinformatics Frameworks.将机器学习纳入既定的生物信息学框架中。

Int J Mol Sci. 2021 Mar 12;22(6):2903. doi: 10.3390/ijms22062903.

本文引用的文献

Evaluation of somatic copy number estimation tools for whole-exome sequencing data.全外显子组测序数据的体细胞拷贝数估计工具评估

Brief Bioinform. 2016 Mar;17(2):185-92. doi: 10.1093/bib/bbv055. Epub 2015 Jul 25.

Allele-specific copy-number discovery from whole-genome and whole-exome sequencing.从全基因组和全外显子组测序中发现等位基因特异性拷贝数

Nucleic Acids Res. 2015 Aug 18;43(14):e90. doi: 10.1093/nar/gkv319. Epub 2015 Apr 16.

CODEX: a normalization and copy number variation detection method for whole exome sequencing.CODEX：一种用于全外显子组测序的标准化及拷贝数变异检测方法。

Nucleic Acids Res. 2015 Mar 31;43(6):e39. doi: 10.1093/nar/gku1363. Epub 2015 Jan 23.

Combinatorial approach to estimate copy number genotype using whole-exome sequencing data.利用全外显子组测序数据估计拷贝数基因型的组合方法。

Genomics. 2015 Mar;105(3):145-9. doi: 10.1016/j.ygeno.2014.12.003. Epub 2014 Dec 20.

Exome sequence read depth methods for identifying copy number changes.用于识别拷贝数变化的外显子序列读取深度方法。

Brief Bioinform. 2015 May;16(3):380-92. doi: 10.1093/bib/bbu027. Epub 2014 Aug 28.

cnvOffSeq: detecting intergenic copy number variation using off-target exome sequencing data.cnvOffSeq：利用脱靶外显子组测序数据检测基因间拷贝数变异

Bioinformatics. 2014 Sep 1;30(17):i639-45. doi: 10.1093/bioinformatics/btu475.

PatternCNV: a versatile tool for detecting copy number changes from exome sequencing data.PatternCNV：一种用于从外显子组测序数据中检测拷贝数变化的通用工具。

Bioinformatics. 2014 Sep 15;30(18):2678-80. doi: 10.1093/bioinformatics/btu363. Epub 2014 May 29.

CANOES: detecting rare copy number variants from whole exome sequencing data.CANOES：从全外显子组测序数据中检测罕见拷贝数变异

Nucleic Acids Res. 2014 Jul;42(12):e97. doi: 10.1093/nar/gku345. Epub 2014 Apr 25.

Using XHMM Software to Detect Copy Number Variation in Whole-Exome Sequencing Data.使用XHMM软件检测全外显子组测序数据中的拷贝数变异。

Curr Protoc Hum Genet. 2014 Apr 24;81:7.23.1-7.23.21. doi: 10.1002/0471142905.hg0723s81.

An evaluation of copy number variation detection tools from whole-exome sequencing data.基于全外显子组测序数据的拷贝数变异检测工具评估

Hum Mutat. 2014 Jul;35(7):899-907. doi: 10.1002/humu.22537. Epub 2014 May 1.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

评估外显子拷贝数变异预测的可重复性。

Assessing the reproducibility of exome copy number variations predictions.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献