DeFCoM：使用以基序为中心的基因组足迹法对转录因子结合位点进行分析和建模。

DeFCoM: analysis and modeling of transcription factor binding sites using a motif-centric genomic footprinter.

作者信息

Quach Bryan, Furey Terrence S

机构信息

Curriculum in Bioinformatics and Computational Biology.

Department of Genetics.

出版信息

Bioinformatics. 2017 Apr 1;33(7):956-963. doi: 10.1093/bioinformatics/btw740.

DOI:10.1093/bioinformatics/btw740

PMID:27993786

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6075477/

Abstract

MOTIVATION

Identifying the locations of transcription factor binding sites is critical for understanding how gene transcription is regulated across different cell types and conditions. Chromatin accessibility experiments such as DNaseI sequencing (DNase-seq) and Assay for Transposase Accessible Chromatin sequencing (ATAC-seq) produce genome-wide data that include distinct 'footprint' patterns at binding sites. Nearly all existing computational methods to detect footprints from these data assume that footprint signals are highly homogeneous across footprint sites. Additionally, a comprehensive and systematic comparison of footprinting methods for specifically identifying which motif sites for a specific factor are bound has not been performed.

RESULTS

Using DNase-seq data from the ENCODE project, we show that a large degree of previously uncharacterized site-to-site variability exists in footprint signal across motif sites for a transcription factor. To model this heterogeneity in the data, we introduce a novel, supervised learning footprinter called Detecting Footprints Containing Motifs (DeFCoM). We compare DeFCoM to nine existing methods using evaluation sets from four human cell-lines and eighteen transcription factors and show that DeFCoM outperforms current methods in determining bound and unbound motif sites. We also analyze the impact of several biological and technical factors on the quality of footprint predictions to highlight important considerations when conducting footprint analyses and assessing the performance of footprint prediction methods. Finally, we show that DeFCoM can detect footprints using ATAC-seq data with similar accuracy as when using DNase-seq data.

AVAILABILITY AND IMPLEMENTATION

Python code available at https://bitbucket.org/bryancquach/defcom.

CONTACT

bquach@email.unc.edu or tsfurey@email.unc.edu.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

识别转录因子结合位点的位置对于理解基因转录如何在不同细胞类型和条件下受到调控至关重要。诸如DNA酶I测序（DNase-seq）和转座酶可及染色质测序分析（ATAC-seq）等染色质可及性实验会产生全基因组数据，这些数据在结合位点处包含独特的“足迹”模式。几乎所有现有的从这些数据中检测足迹的计算方法都假定足迹信号在足迹位点之间高度均匀。此外，尚未对用于特异性识别特定因子结合的基序位点的足迹方法进行全面系统的比较。

结果

使用来自ENCODE项目的DNase-seq数据，我们表明转录因子的基序位点之间的足迹信号存在很大程度的先前未表征的位点间变异性。为了对数据中的这种异质性进行建模，我们引入了一种新颖的监督学习足迹识别器，称为检测含基序足迹（DeFCoM）。我们使用来自四种人类细胞系和十八种转录因子的评估集将DeFCoM与九种现有方法进行比较，结果表明DeFCoM在确定结合和未结合的基序位点方面优于当前方法。我们还分析了几个生物学和技术因素对足迹预测质量的影响，以突出进行足迹分析和评估足迹预测方法性能时的重要考虑因素。最后，我们表明DeFCoM使用ATAC-seq数据检测足迹的准确性与使用DNase-seq数据时相似。

可用性和实现方式

Python代码可在https://bitbucket.org/bryancquach/defcom获取。

联系方式

bquach@email.unc.edu或tsfurey@email.unc.edu。

补充信息

补充数据可在《生物信息学》在线获取。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

DeFCoM：使用以基序为中心的基因组足迹法对转录因子结合位点进行分析和建模。

DeFCoM: analysis and modeling of transcription factor binding sites using a motif-centric genomic footprinter.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

CONTACT

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现方式

联系方式

补充信息

相似文献

引用本文的文献

本文引用的文献

DeFCoM：使用以基序为中心的基因组足迹法对转录因子结合位点进行分析和建模。

DeFCoM: analysis and modeling of transcription factor binding sites using a motif-centric genomic footprinter.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

CONTACT

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现方式

联系方式

补充信息

相似文献

引用本文的文献

本文引用的文献