Li Yi, Xie Xiaohui
BMC Genomics. 2015;16 Suppl 2(Suppl 2):S1. doi: 10.1186/1471-2164-16-S2-S1. Epub 2015 Jan 21.
BACKGROUND: Tumor genomes are often highly heterogeneous, consisting of genomes from multiple subclonal types. Complete characterization of all subclonal types is a fundamental need in tumor genome analysis. With the advancement of next-generation sequencing, computational methods have recently been developed to infer tumor subclonal populations directly from cancer genome sequencing data. Most of these methods are based on sequence information from somatic point mutations, However, the accuracy of these algorithms depends crucially on the quality of the somatic mutations returned by variant calling algorithms, and usually requires a deep coverage to achieve a reasonable level of accuracy. RESULTS: We describe a novel probabilistic mixture model, MixClone, for inferring the cellular prevalences of subclonal populations directly from whole genome sequencing of paired normal-tumor samples. MixClone integrates sequence information of somatic copy number alterations and allele frequencies within a unified probabilistic framework. We demonstrate the utility of the method using both simulated and real cancer sequencing datasets, and show that it significantly outperforms existing methods for inferring tumor subclonal populations. The MixClone package is written in Python and is publicly available at https://github.com/uci-cbcl/MixClone. CONCLUSIONS: The probabilistic mixture model proposed here provides a new framework for subclonal analysis based on cancer genome sequencing data. By applying the method to both simulated and real cancer sequencing data, we show that integrating sequence information from both somatic copy number alterations and allele frequencies can significantly improve the accuracy of inferring tumor subclonal populations.
背景:肿瘤基因组通常具有高度异质性,由多种亚克隆类型的基因组组成。全面表征所有亚克隆类型是肿瘤基因组分析的基本需求。随着下一代测序技术的发展,最近已开发出计算方法,可直接从癌症基因组测序数据中推断肿瘤亚克隆群体。这些方法大多基于体细胞点突变的序列信息,然而,这些算法的准确性关键取决于变异检测算法返回的体细胞突变的质量,并且通常需要深度覆盖才能达到合理的准确性水平。 结果:我们描述了一种新颖的概率混合模型MixClone,用于直接从配对的正常-肿瘤样本的全基因组测序中推断亚克隆群体的细胞丰度。MixClone在统一的概率框架内整合了体细胞拷贝数改变和等位基因频率的序列信息。我们使用模拟和真实的癌症测序数据集证明了该方法的实用性,并表明它在推断肿瘤亚克隆群体方面明显优于现有方法。MixClone软件包用Python编写,可在https://github.com/uci-cbcl/MixClone上公开获取。 结论:本文提出的概率混合模型为基于癌症基因组测序数据的亚克隆分析提供了一个新框架。通过将该方法应用于模拟和真实的癌症测序数据,我们表明整合来自体细胞拷贝数改变和等位基因频率的序列信息可以显著提高推断肿瘤亚克隆群体的准确性。
BMC Genomics. 2015
J Comput Biol. 2017-6
BMC Bioinformatics. 2018-4-11
Bioinformatics. 2018-6-15
BMC Bioinformatics. 2018-4-11
Nat Methods. 2014-3-16
BMC Bioinformatics. 2014-2-1
Bioinformatics. 2013-10-30
Genome Biol. 2013-7-29
Bioinformatics. 2013-7-9
Cell. 2013-3-28