Xu Bo, Cai Hongmin, Zhang Changsheng, Yang Xi, Han Guoqiang
School of Computer Science & Engineering, South China University of Technology, Guangzhou, China.
School of Computer Science & Engineering, South China University of Technology, Guangzhou, China.
Comput Biol Chem. 2016 Aug;63:15-20. doi: 10.1016/j.compbiolchem.2016.02.007. Epub 2016 Feb 17.
Variations in DNA copy number carry important information on genome evolution and regulation of DNA replication in cancer cells. The rapid development of single-cell sequencing technology allows one to explore gene expression heterogeneity among single-cells, thus providing important cancer cell evolution information. Single-cell DNA/RNA sequencing data usually have low genome coverage, which requires an extra step of amplification to accumulate enough samples. However, such amplification will introduce large bias and makes bioinformatics analysis challenging. Accurately modeling the distribution of sequencing data and effectively suppressing the bias influence is the key to success variations analysis. Recent advances demonstrate the technical noises by amplification are more likely to follow negative binomial distribution, a special case of Poisson distribution. Thus, we tackle the problem CNV detection by formulating it into a quadratic optimization problem involving two constraints, in which the underling signals are corrupted by Poisson distributed noises. By imposing the constraints of sparsity and smoothness, the reconstructed read depth signals from single-cell sequencing data are anticipated to fit the CNVs patterns more accurately. An efficient numerical solution based on the classical alternating direction minimization method (ADMM) is tailored to solve the proposed model. We demonstrate the advantages of the proposed method using both synthetic and empirical single-cell sequencing data. Our experimental results demonstrate that the proposed method achieves excellent performance and high promise of success with single-cell sequencing data.
DNA拷贝数变异携带着关于基因组进化以及癌细胞中DNA复制调控的重要信息。单细胞测序技术的快速发展使得人们能够探索单细胞间的基因表达异质性,从而提供重要的癌细胞进化信息。单细胞DNA/RNA测序数据通常基因组覆盖度较低,这需要额外的扩增步骤来积累足够的样本。然而,这种扩增会引入较大偏差,给生物信息学分析带来挑战。准确建模测序数据的分布并有效抑制偏差影响是成功进行变异分析的关键。最近的进展表明,扩增产生的技术噪声更有可能遵循负二项分布,它是泊松分布的一种特殊情况。因此,我们将CNV检测问题转化为一个涉及两个约束的二次优化问题来解决,其中潜在信号被泊松分布噪声所干扰。通过施加稀疏性和平滑性约束,预期从单细胞测序数据重建的读深度信号能更准确地拟合CNV模式。基于经典交替方向最小化方法(ADMM)定制了一种高效的数值解法来求解所提出的模型。我们使用合成的和实际的单细胞测序数据展示了所提方法的优势。我们的实验结果表明,所提方法在单细胞测序数据上取得了优异的性能且具有很高的成功前景。