Suppr超能文献

基于转录因子-DNA 结合特异性的回归模型的稳定性选择。

Stability selection for regression-based models of transcription factor-DNA binding specificity.

机构信息

Institute for Genome Sciences and Policy, Duke University, Durham, NC 27708, USA.

出版信息

Bioinformatics. 2013 Jul 1;29(13):i117-25. doi: 10.1093/bioinformatics/btt221.

Abstract

MOTIVATION

The DNA binding specificity of a transcription factor (TF) is typically represented using a position weight matrix model, which implicitly assumes that individual bases in a TF binding site contribute independently to the binding affinity, an assumption that does not always hold. For this reason, more complex models of binding specificity have been developed. However, these models have their own caveats: they typically have a large number of parameters, which makes them hard to learn and interpret.

RESULTS

We propose novel regression-based models of TF-DNA binding specificity, trained using high resolution in vitro data from custom protein-binding microarray (PBM) experiments. Our PBMs are specifically designed to cover a large number of putative DNA binding sites for the TFs of interest (yeast TFs Cbf1 and Tye7, and human TFs c-Myc, Max and Mad2) in their native genomic context. These high-throughput quantitative data are well suited for training complex models that take into account not only independent contributions from individual bases, but also contributions from di- and trinucleotides at various positions within or near the binding sites. To ensure that our models remain interpretable, we use feature selection to identify a small number of sequence features that accurately predict TF-DNA binding specificity. To further illustrate the accuracy of our regression models, we show that even in the case of paralogous TF with highly similar position weight matrices, our new models can distinguish the specificities of individual factors. Thus, our work represents an important step toward better sequence-based models of individual TF-DNA binding specificity.

AVAILABILITY

Our code is available at http://genome.duke.edu/labs/gordan/ISMB2013. The PBM data used in this article are available in the Gene Expression Omnibus under accession number GSE47026.

摘要

动机

转录因子(TF)的 DNA 结合特异性通常使用位置权重矩阵模型来表示,该模型隐含地假设 TF 结合位点中的单个碱基独立地对结合亲和力做出贡献,这种假设并不总是成立。出于这个原因,已经开发出更复杂的结合特异性模型。然而,这些模型有其自身的限制:它们通常具有大量的参数,这使得它们难以学习和解释。

结果

我们提出了基于回归的新型 TF-DNA 结合特异性模型,这些模型是使用来自定制蛋白质结合微阵列(PBM)实验的高分辨率体外数据进行训练的。我们的 PBM 专门设计用于覆盖感兴趣的 TF(酵母 TF Cbf1 和 Tye7 以及人类 TF c-Myc、Max 和 Mad2)在其天然基因组环境中的大量假定 DNA 结合位点。这些高通量定量数据非常适合训练复杂的模型,这些模型不仅考虑了单个碱基的独立贡献,还考虑了结合位点内或附近的二核苷酸和三核苷酸的贡献。为了确保我们的模型仍然具有可解释性,我们使用特征选择来识别能够准确预测 TF-DNA 结合特异性的少数序列特征。为了进一步说明我们的回归模型的准确性,我们表明,即使对于具有高度相似位置权重矩阵的同源 TF,我们的新模型也可以区分各个因素的特异性。因此,我们的工作代表了朝着更好的基于序列的个体 TF-DNA 结合特异性模型迈出的重要一步。

可用性

我们的代码可在 http://genome.duke.edu/labs/gordan/ISMB2013 上获得。本文中使用的 PBM 数据可在 Gene Expression Omnibus 中以注册号 GSE47026 获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f3c0/3694650/35385450d06b/btt221f1p.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验