Suppr超能文献

拓展用于转录因子结合的全基因组规模研究的DNA形状特征库。

Expanding the repertoire of DNA shape features for genome-scale studies of transcription factor binding.

作者信息

Li Jinsen, Sagendorf Jared M, Chiu Tsu-Pei, Pasi Marco, Perez Alberto, Rohs Remo

机构信息

Computational Biology and Bioinformatics Program, Departments of Biological Sciences, Chemistry, Physics & Astronomy, and Computer Science, University of Southern California, Los Angeles, CA 90089, USA.

Centre for Biomolecular Sciences and School of Pharmacy, University of Nottingham, Nottingham NG7 2RD, UK.

出版信息

Nucleic Acids Res. 2017 Dec 15;45(22):12877-12887. doi: 10.1093/nar/gkx1145.

Abstract

Uncovering the mechanisms that affect the binding specificity of transcription factors (TFs) is critical for understanding the principles of gene regulation. Although sequence-based models have been used successfully to predict TF binding specificities, we found that including DNA shape information in these models improved their accuracy and interpretability. Previously, we developed a method for modeling DNA binding specificities based on DNA shape features extracted from Monte Carlo (MC) simulations. Prediction accuracies of our models, however, have not yet been compared to accuracies of models incorporating DNA shape information extracted from X-ray crystallography (XRC) data or Molecular Dynamics (MD) simulations. Here, we integrated DNA shape information extracted from MC or MD simulations and XRC data into predictive models of TF binding and compared their performance. Models that incorporated structural information consistently showed improved performance over sequence-based models regardless of data source. Furthermore, we derived and validated nine additional DNA shape features beyond our original set of four features. The expanded repertoire of 13 distinct DNA shape features, including six intra-base pair and six inter-base pair parameters and minor groove width, is available in our R/Bioconductor package DNAshapeR and enables a comprehensive structural description of the double helix on a genome-wide scale.

摘要

揭示影响转录因子(TFs)结合特异性的机制对于理解基因调控原理至关重要。尽管基于序列的模型已成功用于预测TF结合特异性,但我们发现将DNA形状信息纳入这些模型可提高其准确性和可解释性。此前,我们开发了一种基于从蒙特卡罗(MC)模拟中提取的DNA形状特征对DNA结合特异性进行建模的方法。然而,我们模型的预测准确性尚未与纳入从X射线晶体学(XRC)数据或分子动力学(MD)模拟中提取的DNA形状信息的模型的准确性进行比较。在此,我们将从MC或MD模拟以及XRC数据中提取的DNA形状信息整合到TF结合的预测模型中,并比较它们的性能。无论数据来源如何,纳入结构信息的模型始终表现出比基于序列的模型更好的性能。此外,我们在最初的四个特征基础上又推导并验证了另外九个DNA形状特征。我们的R/Bioconductor软件包DNAshapeR中提供了包括六个碱基对内和六个碱基对间参数以及小沟宽度在内的13种不同DNA形状特征的扩展库,能够在全基因组范围内对双螺旋进行全面的结构描述。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1ced/5728407/53b39a439d40/gkx1145fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验