用于识别拷贝数变化的外显子序列读取深度方法。

Exome sequence read depth methods for identifying copy number changes.

作者信息

Kadalayil Latha, Rafiq Sajjad, Rose-Zerilli Matthew J J, Pengelly Reuben J, Parker Helen, Oscier David, Strefford Jonathan C, Tapper William J, Gibson Jane, Ennis Sarah, Collins Andrew

出版信息

Brief Bioinform. 2015 May;16(3):380-92. doi: 10.1093/bib/bbu027. Epub 2014 Aug 28.

DOI:10.1093/bib/bbu027

PMID:25169955

Abstract

Copy number variants (CNVs) play important roles in a number of human diseases and in pharmacogenetics. Powerful methods exist for CNV detection in whole genome sequencing (WGS) data, but such data are costly to obtain. Many disease causal CNVs span or are found in genome coding regions (exons), which makes CNV detection using whole exome sequencing (WES) data attractive. If reliably validated against WGS-based CNVs, exome-derived CNVs have potential applications in a clinical setting. Several algorithms have been developed to exploit exome data for CNV detection and comparisons made to find the most suitable methods for particular data samples. The results are not consistent across studies. Here, we review some of the exome CNV detection methods based on depth of coverage profiles and examine their performance to identify problems contributing to discrepancies in published results. We also present a streamlined strategy that uses a single metric, the likelihood ratio, to compare exome methods, and we demonstrated its utility using the VarScan 2 and eXome Hidden Markov Model (XHMM) programs using paired normal and tumour exome data from chronic lymphocytic leukaemia patients. We use array-based somatic CNV (SCNV) calls as a reference standard to compute prevalence-independent statistics, such as sensitivity, specificity and likelihood ratio, for validation of the exome-derived SCNVs. We also account for factors known to influence the performance of exome read depth methods, such as CNV size and frequency, while comparing our findings with published results.

摘要

拷贝数变异（CNV）在多种人类疾病和药物遗传学中发挥着重要作用。全基因组测序（WGS）数据中存在强大的CNV检测方法，但获取此类数据成本高昂。许多致病CNV跨越基因组编码区（外显子）或存在于其中，这使得利用全外显子组测序（WES）数据进行CNV检测颇具吸引力。如果能与基于WGS的CNV进行可靠验证，外显子组衍生的CNV在临床环境中具有潜在应用价值。已经开发了几种算法来利用外显子组数据进行CNV检测，并进行比较以找到适用于特定数据样本的最合适方法。不同研究的结果并不一致。在此，我们回顾一些基于覆盖深度图谱的外显子组CNV检测方法，并检查它们的性能，以确定导致已发表结果存在差异的问题。我们还提出了一种简化策略，使用单一指标似然比来比较外显子组方法，并使用慢性淋巴细胞白血病患者的配对正常和肿瘤外显子组数据，通过VarScan 2和外显子组隐马尔可夫模型（XHMM）程序展示了其效用。我们将基于阵列的体细胞CNV（SCNV）调用作为参考标准，以计算患病率无关的统计量，如敏感性、特异性和似然比，用于验证外显子组衍生的SCNV。在将我们的发现与已发表结果进行比较时，我们还考虑了已知会影响外显子组读深度方法性能的因素，如CNV大小和频率。