Zhang Mucheng, Liu Deli, Tang Jie, Feng Yuan, Wang Tianfang, Dobbin Kevin K, Schliekelman Paul, Zhao Shaying
Department of Biochemistry and Molecular Biology, Institute of Bioinformatics, University of Georgia, Athens, GA30602-7229, USA.
Department of Biostatistics, University of Georgia, Athens, GA30602-7229, USA.
Comput Struct Biotechnol J. 2018 Sep 7;16:335-341. doi: 10.1016/j.csbj.2018.09.001. eCollection 2018.
As next-generation sequencing technology advances and the cost decreases, whole genome sequencing (WGS) has become the preferred platform for the identification of somatic copy number alteration (CNA) events in cancer genomes. To more effectively decipher these massive sequencing data, we developed a software program named SEG, shortened from the word "segment". SEG utilizes mapped read or fragment density for CNA discovery. To reduce CNA artifacts arisen from sequencing and mapping biases, SEG first normalizes the data by taking the log-ratio of each tumor density against its matching normal density. SEG then uses dynamic programming to find change-points among a contiguous log-ratio data series along a chromosome, dividing the chromosome into different segments. SEG finally identifies those segments having CNA. Our analyses with both simulated and real sequencing data indicate that SEG finds more small CNAs than other published software tools.
随着下一代测序技术的进步和成本的降低,全基因组测序(WGS)已成为识别癌症基因组中体细胞拷贝数改变(CNA)事件的首选平台。为了更有效地解读这些海量测序数据,我们开发了一个名为SEG的软件程序,它是“segment”一词的缩写。SEG利用比对后的 reads 或片段密度来发现CNA。为了减少由测序和比对偏差产生的CNA伪影,SEG首先通过计算每个肿瘤密度与其匹配的正常密度的对数比值来对数据进行归一化。然后,SEG使用动态规划在沿着染色体的连续对数比值数据系列中找到变化点,将染色体划分为不同的片段。SEG最终识别出那些存在CNA的片段。我们对模拟和真实测序数据的分析表明,SEG比其他已发表的软件工具能发现更多的小CNA。