Suppr超能文献

ShapeME:一种用于从头发现支撑蛋白质 - DNA 相互作用的结构基序的工具及网络前端。

ShapeME: A tool and web front-end for de novo discovery of structural motifs underpinning protein-DNA interactions.

作者信息

Schroeder Jeremy W, Wolfe Michael B, Freddolino Lydia

机构信息

Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA.

Department of Biochemistry, University of Wisconsin - Madison, Madison, WI 53706, USA.

出版信息

bioRxiv. 2025 Jan 31:2025.01.28.635290. doi: 10.1101/2025.01.28.635290.

Abstract

Determining where transcriptional regulators bind within a genome is paramount to understanding how gene expression is regulated. Historically, position weight matrices (PWMs) have been used to define the binding preferences of DNA binding proteins. However, PWMs treat the identity of each base in a sequence as an independent and additive measure of binding preference, which can limit their utility. Models that consider higher order interactions between nearby bases yield greater success in predicting proteins' binding to DNA, but for many proteins there is still substantial room for improvement in predicting and understanding the determinants of proteins' binding to DNA. In addition to DNA sequence motifs, structural motifs (e.g., a narrow minor groove width) are important determinants of binding for some DNA-binding proteins. Despite the initial success of algorithms using structural features of DNA to predict binding properties of proteins from either ChIP-seq or SELEX data, there remains a need for a structural motif discovery framework which can be applied to data from a variety of experimental designs. Here, we present a unified workflow, capable of utilizing virtually any type of data representing sequence coverage or enrichment (e.g. ChIP-seq, RNA-seq, SELEX, etc.), to discover short structural motifs with explanatory power for a protein's DNA binding preference. We couple the DNAshapeR algorithm with our own information-theoretic approach to motif discovery, and wrap shape and sequence motif inference and model selection into a single tool called ShapeME. Application of our structural motif discovery algorithm to proteins with ChIP-seq data in ENCODE datasets reveals a subset of proteins where short structural motifs outperform the best PWM for that protein as determined from the JASPAR database, or as identified by the sequence motif elicitation tool STREME. Our approach offers a powerful and versatile framework for inferring structural DNA binding motifs, and will complement current sequence-based motif elicitation tools in discovery of protein-DNA interaction principles. A web-based interface to ShapeME is available at https://seq2fun.dcmb.med.umich.edu/shapeme, with full source code available at https://github.com/freddolino-lab/ShapeME.

摘要

确定转录调节因子在基因组中的结合位置对于理解基因表达如何被调控至关重要。从历史上看,位置权重矩阵(PWM)已被用于定义DNA结合蛋白的结合偏好。然而,PWM将序列中每个碱基的身份视为结合偏好的独立且累加的度量,这可能会限制它们的效用。考虑相邻碱基之间高阶相互作用的模型在预测蛋白质与DNA的结合方面取得了更大的成功,但对于许多蛋白质来说,在预测和理解蛋白质与DNA结合的决定因素方面仍有很大的改进空间。除了DNA序列基序外,结构基序(例如狭窄的小沟宽度)是一些DNA结合蛋白结合的重要决定因素。尽管使用DNA结构特征的算法在从ChIP-seq或SELEX数据预测蛋白质结合特性方面取得了初步成功,但仍然需要一个可以应用于来自各种实验设计的数据的结构基序发现框架。在这里,我们提出了一个统一的工作流程,能够利用几乎任何类型的表示序列覆盖或富集的数据(例如ChIP-seq、RNA-seq、SELEX等)来发现对蛋白质的DNA结合偏好具有解释力的短结构基序。我们将DNAshapeR算法与我们自己的信息论基序发现方法相结合,并将形状和序列基序推断以及模型选择整合到一个名为ShapeME的单一工具中。将我们的结构基序发现算法应用于ENCODE数据集中具有ChIP-seq数据的蛋白质,揭示了一部分蛋白质,其中短结构基序在预测该蛋白质的最佳PWM方面表现优于从JASPAR数据库确定的或由序列基序诱导工具STREME识别的最佳PWM。我们的方法为推断结构DNA结合基序提供了一个强大且通用的框架,并将在发现蛋白质-DNA相互作用原理方面补充当前基于序列的基序诱导工具。ShapeME的基于网络的界面可在https://seq2fun.dcmb.med.umich.edu/shapeme获得,完整的源代码可在https://github.com/freddolino-lab/ShapeME获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c95c/11838363/beec768b99b3/nihpp-2025.01.28.635290v1-f0001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验