Suppr超能文献

对区分重叠调控注释的序列特征进行去卷积。

Deconvolving sequence features that discriminate between overlapping regulatory annotations.

作者信息

Kakumanu Akshay, Velasco Silvia, Mazzoni Esteban, Mahony Shaun

机构信息

Center for Eukaryotic Gene Regulation, Department of Biochemistry & Molecular Biology, The Pennsylvania State University, University Park, PA, United States of America.

Department of Biology, New York University, 100 Washington Square East, New York, NY, United States of America.

出版信息

PLoS Comput Biol. 2017 Oct 19;13(10):e1005795. doi: 10.1371/journal.pcbi.1005795. eCollection 2017 Oct.

Abstract

Genomic loci with regulatory potential can be annotated with various properties. For example, genomic sites bound by a given transcription factor (TF) can be divided according to whether they are proximal or distal to known promoters. Sites can be further labeled according to the cell types and conditions in which they are active. Given such a collection of labeled sites, it is natural to ask what sequence features are associated with each annotation label. However, discovering such label-specific sequence features is often confounded by overlaps between the labels; e.g. if regulatory sites specific to a given cell type are also more likely to be promoter-proximal, it is difficult to assess whether motifs identified in that set of sites are associated with the cell type or associated with promoters. In order to meet this challenge, we developed SeqUnwinder, a principled approach to deconvolving interpretable discriminative sequence features associated with overlapping annotation labels. We demonstrate the novel analysis abilities of SeqUnwinder using three examples. Firstly, SeqUnwinder is able to unravel sequence features associated with the dynamic binding behavior of TFs during motor neuron programming from features associated with chromatin state in the initial embryonic stem cells. Secondly, we characterize distinct sequence properties of multi-condition and cell-specific TF binding sites after controlling for uneven associations with promoter proximity. Finally, we demonstrate the scalability of SeqUnwinder to discover cell-specific sequence features from over one hundred thousand genomic loci that display DNase I hypersensitivity in one or more ENCODE cell lines.

摘要

具有调控潜力的基因组位点可以用各种属性进行注释。例如,由给定转录因子(TF)结合的基因组位点可以根据它们与已知启动子的距离是近端还是远端来划分。位点还可以根据它们活跃的细胞类型和条件进一步标记。给定这样一组标记位点,自然会问与每个注释标签相关的序列特征是什么。然而,发现这种特定于标签的序列特征通常会因标签之间的重叠而混淆;例如,如果特定于给定细胞类型的调控位点也更有可能是启动子近端的,那么很难评估在该组位点中鉴定出的基序是与细胞类型相关还是与启动子相关。为了应对这一挑战,我们开发了SeqUnwinder,这是一种有原则的方法,用于解卷积与重叠注释标签相关的可解释的判别序列特征。我们用三个例子展示了SeqUnwinder的新颖分析能力。首先,SeqUnwinder能够从初始胚胎干细胞中与染色质状态相关的特征中解析出与运动神经元编程过程中转录因子的动态结合行为相关的序列特征。其次,在控制了与启动子接近程度的不均衡关联后,我们表征了多条件和细胞特异性转录因子结合位点的不同序列特性。最后,我们展示了SeqUnwinder的可扩展性,以从在一种或多种ENCODE细胞系中显示DNase I超敏感性的超过十万个基因组位点中发现细胞特异性序列特征。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/81f1/5663517/53eeeec25ecc/pcbi.1005795.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验