Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, 201203, China.
University of Chinese Academy of Sciences, Beijing, 100049, China.
BMC Bioinformatics. 2021 Jan 15;22(1):23. doi: 10.1186/s12859-020-03924-5.
Copy number alterations (CNAs), due to their large impact on the genome, have been an important contributing factor to oncogenesis and metastasis. Detecting genomic alterations from the shallow-sequencing data of a low-purity tumor sample remains a challenging task.
We introduce Accucopy, a method to infer total copy numbers (TCNs) and allele-specific copy numbers (ASCNs) from challenging low-purity and low-coverage tumor samples. Accucopy adopts many robust statistical techniques such as kernel smoothing of coverage differentiation information to discern signals from noise and combines ideas from time-series analysis and the signal-processing field to derive a range of estimates for the period in a histogram of coverage differentiation information. Statistical learning models such as the tiered Gaussian mixture model, the expectation-maximization algorithm, and sparse Bayesian learning were customized and built into the model. Accucopy is implemented in C++ /Rust, packaged in a docker image, and supports non-human samples, more at http://www.yfish.org/software/ .
We describe Accucopy, a method that can predict both TCNs and ASCNs from low-coverage low-purity tumor sequencing data. Through comparative analyses in both simulated and real-sequencing samples, we demonstrate that Accucopy is more accurate than Sclust, ABSOLUTE, and Sequenza.
由于拷贝数改变(CNAs)对基因组的影响较大,因此一直是致癌和转移的重要因素。从低纯度肿瘤样本的浅层测序数据中检测基因组改变仍然是一项具有挑战性的任务。
我们介绍了 Accucopy,这是一种从具有挑战性的低纯度和低覆盖度肿瘤样本中推断总拷贝数(TCN)和等位基因特异性拷贝数(ASCN)的方法。Accucopy 采用了许多稳健的统计技术,例如覆盖分化信息的核平滑,以辨别信号与噪声,并结合时间序列分析和信号处理领域的思想,从覆盖分化信息的直方图中推导出一系列估计值。定制了统计学习模型,例如分层高斯混合模型、期望最大化算法和稀疏贝叶斯学习,并将其构建到模型中。Accucopy 是用 C++/Rust 实现的,封装在一个 docker 映像中,并支持非人类样本,更多信息请访问 http://www.yfish.org/software/ 。
我们描述了 Accucopy,这是一种可以从低覆盖度低纯度肿瘤测序数据中预测 TCN 和 ASCN 的方法。通过在模拟和真实测序样本中的比较分析,我们证明 Accucopy 比 Sclust、ABSOLUTE 和 Sequenza 更准确。