Suppr超能文献

校正高通量测序数据中的核苷酸特异性偏差。

Correcting nucleotide-specific biases in high-throughput sequencing data.

作者信息

Wang Jeremy R, Quach Bryan, Furey Terrence S

机构信息

Department of Genetics, University of North Carolina at Chapel Hill, CB 7032, 7314 Medical Biomolecular Research Building, 111 Mason Farm Road, Chapel Hill, 27599, NC, USA.

Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.

出版信息

BMC Bioinformatics. 2017 Aug 1;18(1):357. doi: 10.1186/s12859-017-1766-x.

Abstract

BACKGROUND

High-throughput sequence (HTS) data exhibit position-specific nucleotide biases that obscure the intended signal and reduce the effectiveness of these data for downstream analyses. These biases are particularly evident in HTS assays for identifying regulatory regions in DNA (DNase-seq, ChIP-seq, FAIRE-seq, ATAC-seq). Biases may result from many experiment-specific factors, including selectivity of DNA restriction enzymes and fragmentation method, as well as sequencing technology-specific factors, such as choice of adapters/primers and sample amplification methods.

RESULTS

We present a novel method to detect and correct position-specific nucleotide biases in HTS short read data. Our method calculates read-specific weights based on aligned reads to correct the over- or underrepresentation of position-specific nucleotide subsequences, both within and adjacent to the aligned read, relative to a baseline calculated in assay-specific enriched regions. Using HTS data from a variety of ChIP-seq, DNase-seq, FAIRE-seq, and ATAC-seq experiments, we show that our weight-adjusted reads reduce the position-specific nucleotide imbalance across reads and improve the utility of these data for downstream analyses, including identification and characterization of open chromatin peaks and transcription-factor binding sites.

CONCLUSIONS

A general-purpose method to characterize and correct position-specific nucleotide sequence biases fills the need to recognize and deal with, in a systematic manner, binding-site preference for the growing number of HTS-based epigenetic assays. As the breadth and impact of these biases are better understood, the availability of a standard toolkit to correct them will be important.

摘要

背景

高通量测序(HTS)数据呈现位置特异性核苷酸偏差,这会掩盖预期信号并降低这些数据用于下游分析的有效性。这些偏差在用于识别DNA调控区域的HTS检测中(DNase-seq、ChIP-seq、FAIRE-seq、ATAC-seq)尤为明显。偏差可能由许多实验特异性因素导致,包括DNA限制性酶的选择性和片段化方法,以及测序技术特异性因素,如接头/引物的选择和样本扩增方法。

结果

我们提出了一种新方法来检测和校正HTS短读长数据中的位置特异性核苷酸偏差。我们的方法基于比对后的读段计算读段特异性权重,以校正比对读段内及其相邻位置特异性核苷酸子序列相对于在检测特异性富集区域中计算的基线的过度或不足代表性。使用来自各种ChIP-seq、DNase-seq、FAIRE-seq和ATAC-seq实验的HTS数据,我们表明我们的权重调整后的读段减少了读段间位置特异性核苷酸失衡,并提高了这些数据用于下游分析的效用,包括开放染色质峰和转录因子结合位点的识别与表征。

结论

一种用于表征和校正位置特异性核苷酸序列偏差的通用方法满足了以系统方式识别和处理越来越多基于HTS的表观遗传检测中结合位点偏好的需求。随着对这些偏差的广度和影响有了更好的理解,拥有一个校正它们的标准工具包将很重要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0d2f/5540620/d20136b8adf5/12859_2017_1766_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验