Chen Xi, Shi Xu, Hilakivi-Clarke Leena, Shajahan-Haq Ayesha N, Clarke Robert, Xuan Jianhua
Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA.
Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC 20057, USA.
Bioinformatics. 2017 Jan 15;33(2):177-183. doi: 10.1093/bioinformatics/btw605. Epub 2016 Sep 21.
Whole genome DNA-sequencing (WGS) of paired tumor and normal samples has enabled the identification of somatic DNA changes in an unprecedented detail. Large-scale identification of somatic structural variations (SVs) for a specific cancer type will deepen our understanding of driver mechanisms in cancer progression. However, the limited number of WGS samples, insufficient read coverage, and the impurity of tumor samples that contain normal and neoplastic cells, limit reliable and accurate detection of somatic SVs.
We present a novel pattern-based probabilistic approach, PSSV, to identify somatic structural variations from WGS data. PSSV features a mixture model with hidden states representing different mutation patterns; PSSV can thus differentiate heterozygous and homozygous SVs in each sample, enabling the identification of those somatic SVs with heterozygous mutations in normal samples and homozygous mutations in tumor samples. Simulation studies demonstrate that PSSV outperforms existing tools. PSSV has been successfully applied to breast cancer data to identify somatic SVs of key factors associated with breast cancer development.
An R package of PSSV is available at http://www.cbil.ece.vt.edu/software.htm CONTACT: xuan@vt.eduSupplementary information: Supplementary data are available at Bioinformatics online.
对配对的肿瘤样本和正常样本进行全基因组DNA测序(WGS),能够以前所未有的详细程度识别体细胞DNA变化。对特定癌症类型的体细胞结构变异(SVs)进行大规模识别,将加深我们对癌症进展中驱动机制的理解。然而,WGS样本数量有限、读取覆盖不足以及肿瘤样本中含有正常细胞和肿瘤细胞的杂质,限制了对体细胞SVs的可靠和准确检测。
我们提出了一种基于模式的新型概率方法PSSV,用于从WGS数据中识别体细胞结构变异。PSSV具有一个混合模型,其隐藏状态代表不同的突变模式;因此,PSSV可以区分每个样本中的杂合和纯合SVs,从而能够识别那些在正常样本中具有杂合突变而在肿瘤样本中具有纯合突变的体细胞SVs。模拟研究表明,PSSV优于现有工具。PSSV已成功应用于乳腺癌数据,以识别与乳腺癌发展相关的关键因素的体细胞SVs。
PSSV的R包可在http://www.cbil.ece.vt.edu/software.htm获取。联系方式:xuan@vt.edu。补充信息:补充数据可在《生物信息学》在线获取。