Suppr超能文献

DeFCoM:使用以基序为中心的基因组足迹法对转录因子结合位点进行分析和建模。

DeFCoM: analysis and modeling of transcription factor binding sites using a motif-centric genomic footprinter.

作者信息

Quach Bryan, Furey Terrence S

机构信息

Curriculum in Bioinformatics and Computational Biology.

Department of Genetics.

出版信息

Bioinformatics. 2017 Apr 1;33(7):956-963. doi: 10.1093/bioinformatics/btw740.

Abstract

MOTIVATION

Identifying the locations of transcription factor binding sites is critical for understanding how gene transcription is regulated across different cell types and conditions. Chromatin accessibility experiments such as DNaseI sequencing (DNase-seq) and Assay for Transposase Accessible Chromatin sequencing (ATAC-seq) produce genome-wide data that include distinct 'footprint' patterns at binding sites. Nearly all existing computational methods to detect footprints from these data assume that footprint signals are highly homogeneous across footprint sites. Additionally, a comprehensive and systematic comparison of footprinting methods for specifically identifying which motif sites for a specific factor are bound has not been performed.

RESULTS

Using DNase-seq data from the ENCODE project, we show that a large degree of previously uncharacterized site-to-site variability exists in footprint signal across motif sites for a transcription factor. To model this heterogeneity in the data, we introduce a novel, supervised learning footprinter called Detecting Footprints Containing Motifs (DeFCoM). We compare DeFCoM to nine existing methods using evaluation sets from four human cell-lines and eighteen transcription factors and show that DeFCoM outperforms current methods in determining bound and unbound motif sites. We also analyze the impact of several biological and technical factors on the quality of footprint predictions to highlight important considerations when conducting footprint analyses and assessing the performance of footprint prediction methods. Finally, we show that DeFCoM can detect footprints using ATAC-seq data with similar accuracy as when using DNase-seq data.

AVAILABILITY AND IMPLEMENTATION

Python code available at https://bitbucket.org/bryancquach/defcom.

CONTACT

bquach@email.unc.edu or tsfurey@email.unc.edu.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

识别转录因子结合位点的位置对于理解基因转录如何在不同细胞类型和条件下受到调控至关重要。诸如DNA酶I测序(DNase-seq)和转座酶可及染色质测序分析(ATAC-seq)等染色质可及性实验会产生全基因组数据,这些数据在结合位点处包含独特的“足迹”模式。几乎所有现有的从这些数据中检测足迹的计算方法都假定足迹信号在足迹位点之间高度均匀。此外,尚未对用于特异性识别特定因子结合的基序位点的足迹方法进行全面系统的比较。

结果

使用来自ENCODE项目的DNase-seq数据,我们表明转录因子的基序位点之间的足迹信号存在很大程度的先前未表征的位点间变异性。为了对数据中的这种异质性进行建模,我们引入了一种新颖的监督学习足迹识别器,称为检测含基序足迹(DeFCoM)。我们使用来自四种人类细胞系和十八种转录因子的评估集将DeFCoM与九种现有方法进行比较,结果表明DeFCoM在确定结合和未结合的基序位点方面优于当前方法。我们还分析了几个生物学和技术因素对足迹预测质量的影响,以突出进行足迹分析和评估足迹预测方法性能时的重要考虑因素。最后,我们表明DeFCoM使用ATAC-seq数据检测足迹的准确性与使用DNase-seq数据时相似。

可用性和实现方式

Python代码可在https://bitbucket.org/bryancquach/defcom获取。

联系方式

bquach@email.unc.edutsfurey@email.unc.edu

补充信息

补充数据可在《生物信息学》在线获取。

相似文献

引用本文的文献

7
Chromatin accessibility profiling methods.染色质可及性分析方法。
Nat Rev Methods Primers. 2021;1. doi: 10.1038/s43586-020-00008-9. Epub 2021 Jan 21.

本文引用的文献

4
DNase footprint signatures are dictated by factor dynamics and DNA sequence.DNase 足迹图谱由因子动态和 DNA 序列决定。
Mol Cell. 2014 Oct 23;56(2):275-285. doi: 10.1016/j.molcel.2014.08.016. Epub 2014 Sep 18.
8
Protein-DNA binding: complexities and multi-protein codes.蛋白质与 DNA 的相互作用:复杂性和多蛋白编码。
Nucleic Acids Res. 2014 Feb;42(4):2099-111. doi: 10.1093/nar/gkt1112. Epub 2013 Nov 16.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验