Bioinformatics and Genomics Program, The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, 16802, USA.
Department of Statistics, The Pennsylvania State University, University Park, PA, 16802, USA.
Sci Rep. 2017 Apr 13;7(1):885. doi: 10.1038/s41598-017-01005-x.
Whole Exome Sequencing (WES) is a powerful clinical diagnostic tool for discovering the genetic basis of many diseases. A major shortcoming of WES is uneven coverage of sequence reads over the exome targets contributing to many low coverage regions, which hinders accurate variant calling. In this study, we devised two novel metrics, Cohort Coverage Sparseness (CCS) and Unevenness (U) Scores for a detailed assessment of the distribution of coverage of sequence reads. Employing these metrics we revealed non-uniformity of coverage and low coverage regions in the WES data generated by three different platforms. This non-uniformity of coverage is both local (coverage of a given exon across different platforms) and global (coverage of all exons across the genome in the given platform). The low coverage regions encompassing functionally important genes were often associated with high GC content, repeat elements and segmental duplications. While a majority of the problems associated with WES are due to the limitations of the capture methods, further refinements in WES technologies have the potential to enhance its clinical applications.
全外显子组测序(WES)是一种强大的临床诊断工具,可用于发现许多疾病的遗传基础。WES 的一个主要缺点是外显子靶区的测序读长覆盖不均匀,导致许多低覆盖区域,这阻碍了准确的变异调用。在这项研究中,我们设计了两个新的指标,即队列覆盖稀疏度(CCS)和不均匀性(U)评分,用于详细评估测序读长覆盖的分布。利用这些指标,我们揭示了三种不同平台生成的 WES 数据中的覆盖不均匀性和低覆盖区域。这种覆盖的不均匀性既有局部的(给定外显子在不同平台上的覆盖),也有全局的(给定平台上基因组中所有外显子的覆盖)。包含功能重要基因的低覆盖区域通常与高 GC 含量、重复元件和片段重复有关。虽然与 WES 相关的大多数问题都归因于捕获方法的局限性,但进一步改进 WES 技术有可能增强其临床应用。