Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
Harvard-MIT Division of Health Sciences and Technology, Harvard Medical School, Boston, MA, USA.
Bioinformatics. 2022 Oct 14;38(20):4677-4686. doi: 10.1093/bioinformatics/btac586.
Somatic copy-number alterations (SCNAs) play an important role in cancer development. Systematic noise in sequencing and array data present a significant challenge to the inference of SCNAs for cancer genome analyses. As part of The Cancer Genome Atlas, the Broad Institute Genome Characterization Center developed the Tangent normalization method to generate copy-number profiles using data from single-nucleotide polymorphism (SNP) arrays and whole-exome sequencing (WES) technologies for over 10 000 pairs of tumors and matched normal samples. Here, we describe the Tangent method, which uses a unique linear combination of normal samples as a reference for each tumor sample, to subtract systematic errors that vary across samples. We also describe a modification of Tangent, called Pseudo-Tangent, which enables denoising through comparisons between tumor profiles when few normal samples are available.
Tangent normalization substantially increases signal-to-noise ratios (SNRs) compared to conventional normalization methods in both SNP array and WES analyses. Tangent and Pseudo-Tangent normalizations improve the SNR by reducing noise with minimal effect on signal and exceed the contribution of other steps in the analysis such as choice of segmentation algorithm. Tangent and Pseudo-Tangent are broadly applicable and enable more accurate inference of SCNAs from DNA sequencing and array data.
Tangent is available at https://github.com/broadinstitute/tangent and as a Docker image (https://hub.docker.com/r/broadinstitute/tangent). Tangent is also the normalization method for the copy-number pipeline in Genome Analysis Toolkit 4 (GATK4).
Supplementary data are available at Bioinformatics online.
体细胞拷贝数改变(SCNAs)在癌症发展中起着重要作用。测序和阵列数据中的系统噪声对癌症基因组分析中 SCNAs 的推断提出了重大挑战。作为癌症基因组图谱的一部分,布罗德研究所基因组特征中心开发了 Tangent 标准化方法,使用来自单核苷酸多态性(SNP)阵列和全外显子测序(WES)技术的数据为超过 10000 对肿瘤和匹配的正常样本生成拷贝数谱。在这里,我们描述了 Tangent 方法,该方法使用正常样本的独特线性组合作为每个肿瘤样本的参考,以减去跨样本变化的系统误差。我们还描述了 Tangent 的一种修改,称为 Pseudo-Tangent,当可用的正常样本很少时,它可以通过比较肿瘤谱来实现去噪。
与 SNP 阵列和 WES 分析中的常规标准化方法相比,Tangent 标准化极大地增加了信号与噪声比(SNR)。Tangent 和 Pseudo-Tangent 标准化通过减少噪声而对信号的影响最小,从而提高 SNR,超过了分析中其他步骤的贡献,例如分段算法的选择。Tangent 和 Pseudo-Tangent 具有广泛的适用性,可以从 DNA 测序和阵列数据中更准确地推断 SCNAs。
Tangent 可在 https://github.com/broadinstitute/tangent 上获得,也可以作为 Docker 映像(https://hub.docker.com/r/broadinstitute/tangent)获得。Tangent 也是基因组分析工具包 4(GATK4)中拷贝数管道的标准化方法。
补充数据可在生物信息学在线获得。