ARUP Institute for Clinical and Experimental Pathology, Salt Lake City, UT, USA.
BMC Bioinformatics. 2022 Jul 19;23(1):285. doi: 10.1186/s12859-022-04820-w.
Copy number variants (CNVs) play a significant role in human heredity and disease. However, sensitive and specific characterization of germline CNVs from NGS data has remained challenging, particularly for hybridization-capture data in which read counts are the primary source of copy number information.
We describe two algorithmic adaptations that improve CNV detection accuracy in a Hidden Markov Model (HMM) context. First, we present a method for computing target- and copy number-specific emission distributions. Second, we demonstrate that the Pointwise Maximum a posteriori (PMAP) HMM decoding procedure yields improved sensitivity for small CNV calls compared to the more common Viterbi HMM decoder. We develop a prototype implementation, called Cobalt, and compare it to other CNV detection tools using sets of simulated and previously detected CNVs with sizes spanning a single exon to a full chromosome.
In both the simulation and previously detected CNV studies Cobalt shows similar sensitivity but significantly fewer false positive detections compared to other callers. Overall sensitivity is 80-90% for deletion CNVs spanning 1-4 targets and 90-100% for larger deletion events, while sensitivity is somewhat lower for small duplication CNVs.
拷贝数变异(CNVs)在人类遗传和疾病中起着重要作用。然而,从 NGS 数据中敏感而特异性地描述种系 CNVs 一直具有挑战性,特别是对于杂交捕获数据,其中读取计数是拷贝数信息的主要来源。
我们描述了两种算法适应性调整,可在隐马尔可夫模型(HMM)上下文中提高 CNV 检测准确性。首先,我们提出了一种用于计算目标和拷贝数特异性发射分布的方法。其次,我们证明与更常见的维特比 HMM 解码器相比,点最大后验(PMAP)HMM 解码过程可提高小 CNV 调用的灵敏度。我们开发了一个名为 Cobalt 的原型实现,并使用跨越单个外显子到整个染色体的大小的模拟和先前检测到的 CNV 集与其他 CNV 检测工具进行比较。
在模拟和先前检测到的 CNV 研究中,Cobalt 与其他调用者相比,显示出相似的灵敏度,但假阳性检测明显减少。对于跨越 1-4 个靶标的 1-4 个靶标的缺失 CNV,总体灵敏度为 80-90%,对于较大的缺失事件,灵敏度为 90-100%,而较小的重复 CNV 的灵敏度则稍低。