Suppr超能文献

GC 偏倚的多分辨率校正及其在拷贝数改变识别中的应用。

Multiresolution correction of GC bias and application to identification of copy number alterations.

机构信息

School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju, South Korea.

出版信息

Bioinformatics. 2019 Oct 15;35(20):3890-3897. doi: 10.1093/bioinformatics/btz174.

Abstract

MOTIVATION

Whole-genome sequencing (WGS) data are affected by various sequencing biases such as GC bias and mappability bias. These biases degrade performance on detection of genetic variations such as copy number alterations. The existing methods use a relation between the GC proportion and depth of coverage (DOC) of markers by means of regression models. Nonetheless, severity of the GC bias varies from sample to sample. We developed a new method for correction of GC bias on the basis of multiresolution analysis. We used a translation-invariant wavelet transform to decompose biased raw signals into high- and low-frequency coefficients. Then, we modeled the relation between GC proportion and DOC of the genomic regions and constructed new control DOC signals that reflect the GC bias. The control DOC signals are used for normalizing genomic sequences by correcting the GC bias.

RESULTS

When we applied our method to simulated sequencing data with various degrees of GC bias, our method showed more robust performance on correcting the GC bias than the other methods did. We also applied our method to real-world cancer sequencing datasets and successfully identified cancer-related focal alterations even when cancer genomes were not normalized to normal control samples. In conclusion, our method can be employed for WGS data with different degrees of GC bias.

AVAILABILITY AND IMPLEMENTATION

The code is available at http://gcancer.org/wabico.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

全基因组测序 (WGS) 数据受到各种测序偏差的影响,如 GC 偏差和可映射性偏差。这些偏差会降低对遗传变异(如拷贝数改变)检测的性能。现有的方法使用 GC 比例和标记覆盖率 (DOC) 之间的关系通过回归模型来实现。尽管如此,GC 偏差的严重程度因样本而异。我们基于多分辨率分析开发了一种新的 GC 偏差校正方法。我们使用平移不变小波变换将有偏差的原始信号分解为高频和低频系数。然后,我们对基因组区域的 GC 比例和 DOC 之间的关系进行建模,并构建新的控制 DOC 信号,反映 GC 偏差。控制 DOC 信号用于通过校正 GC 偏差来归一化基因组序列。

结果

当我们将我们的方法应用于具有不同程度 GC 偏差的模拟测序数据时,我们的方法在纠正 GC 偏差方面表现出比其他方法更稳健的性能。我们还将我们的方法应用于真实世界的癌症测序数据集,即使在没有将癌症基因组归一化为正常对照样本的情况下,也成功地识别了与癌症相关的焦点改变。总之,我们的方法可用于具有不同程度 GC 偏差的 WGS 数据。

可用性和实现

代码可在 http://gcancer.org/wabico 获得。

补充信息

补充数据可在 Bioinformatics 在线获得。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验