Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland, Department of Genetic Medicine and Development, University of Geneva Medical School, Institute of Genetics and Genomics in Geneva, University of Geneva, 1211, Geneva, Switzerland and Center for Integrative Genomics, Faculty of Biology and Medicine, University of Lausanne, 1011, Lausanne, Switzerland.
Bioinformatics. 2014 Jan 15;30(2):165-71. doi: 10.1093/bioinformatics/btt667. Epub 2013 Nov 18.
High-throughput sequencing technologies enable the genome-wide analysis of the impact of genetic variation on molecular phenotypes at unprecedented resolution. However, although powerful, these technologies can also introduce unexpected artifacts.
We investigated the impact of library amplification bias on the identification of allele-specific (AS) molecular events from high-throughput sequencing data derived from chromatin immunoprecipitation assays (ChIP-seq). Putative AS DNA binding activity for RNA polymerase II was determined using ChIP-seq data derived from lymphoblastoid cell lines of two parent-daughter trios. We found that, at high-sequencing depth, many significant AS binding sites suffered from an amplification bias, as evidenced by a larger number of clonal reads representing one of the two alleles. To alleviate this bias, we devised an amplification bias detection strategy, which filters out sites with low read complexity and sites featuring a significant excess of clonal reads. This method will be useful for AS analyses involving ChIP-seq and other functional sequencing assays.
The R package abs filter for library clonality simulations and detection of amplification-biased sites is available from http://updepla1srv1.epfl.ch/waszaks/absfilter
高通量测序技术能够以前所未有的分辨率,全面分析遗传变异对分子表型的影响。然而,尽管这些技术功能强大,但它们也可能引入意想不到的伪影。
我们研究了文库扩增偏倚对从染色质免疫沉淀分析(ChIP-seq)得出的高通量测序数据中鉴定等位基因特异性(AS)分子事件的影响。使用来自两个亲子三人组的淋巴母细胞系的 ChIP-seq 数据,确定了 RNA 聚合酶 II 的假定 AS DNA 结合活性。我们发现,在高测序深度下,许多显著的 AS 结合位点受到扩增偏倚的影响,这表现在代表两个等位基因之一的克隆读取数量较多。为了减轻这种偏差,我们设计了一种扩增偏差检测策略,该策略过滤掉读取复杂度低的位点和具有显著克隆读取过剩的位点。这种方法将对涉及 ChIP-seq 和其他功能测序分析的 AS 分析非常有用。
用于文库克隆性模拟和检测扩增偏差的 R 包 abs filter 可从 http://updepla1srv1.epfl.ch/waszaks/absfilter 获得。