Chu Yanshuo, Nie Chenxi, Wang Yadong
Center of Bioinfomatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China.
Front Genet. 2020 Feb 27;10:1374. doi: 10.3389/fgene.2019.01374. eCollection 2019.
State-of-the-art next-generation sequencing (NGS)-based subclonal reconstruction methods perform poorly on somatic copy number alternations (SCNAs), due to not only it needs to simultaneously estimate the subclonal population frequency and the absolute copy number for each SCNA, but also there exist complex bias and noise in the tumor and its paired normal sequencing data. Both existing NGS-based SCNA detection methods and SCNA's subclonal population frequency inferring tools use the read count on radio (RCR) of tumor to its paired normal as the key feature of tumor sequencing data; however, the sequencing error and bias have great impact on RCR, which leads to a large number of redundant SCNA segments that make the subsequent process of SCNA's subclonal population frequency inferring and subclonal reconstruction time-consuming and inaccurate. We perform a mathematical analysis of the solution number of SCNA's subclonal frequency, and we propose a computational algorithm to reduce the impact of false breakpoints based on it. We construct a new probability model that incorporates the RCR bias correction algorithm, and by stringing it with the false breakpoint filtering algorithm, we construct a whole SCNA's subclonal population reconstruction pipeline. The experimental result shows that our pipeline outperforms the existing subclonal reconstruction programs both on simulated data and TCGA data. Source code is publicly available as a Python package at https://github.com/dustincys/msphy-SCNAClonal.
基于最先进的下一代测序(NGS)的亚克隆重建方法在体细胞拷贝数改变(SCNA)方面表现不佳,这不仅是因为它需要同时估计每个SCNA的亚克隆群体频率和绝对拷贝数,还因为在肿瘤及其配对的正常测序数据中存在复杂的偏差和噪声。现有的基于NGS的SCNA检测方法和SCNA的亚克隆群体频率推断工具都将肿瘤与其配对正常样本的读数计数比(RCR)作为肿瘤测序数据的关键特征;然而,测序错误和偏差对RCR有很大影响,这导致大量冗余的SCNA片段,使得后续的SCNA亚克隆群体频率推断和亚克隆重建过程既耗时又不准确。我们对SCNA亚克隆频率的解的数量进行了数学分析,并在此基础上提出了一种计算算法来减少错误断点的影响。我们构建了一个包含RCR偏差校正算法的新概率模型,并将其与错误断点过滤算法串联起来,构建了一个完整的SCNA亚克隆群体重建流程。实验结果表明,我们的流程在模拟数据和TCGA数据上均优于现有的亚克隆重建程序。源代码作为一个Python包在https://github.com/dustincys/msphy-SCNAClonal上公开可用。
BMC Bioinformatics. 2018-4-11
BMC Genomics. 2015
Nat Biotechnol. 2025-4
Nucleic Acids Res. 2020-1-8
Nucleic Acids Res. 2019-1-8
Brief Bioinform. 2019-1-18
BMC Genomics. 2015
BMC Bioinformatics. 2014-2-1