PeakRegressor 识别出负责 STAT1 结合位点及其潜在 rSNP 的复合序列基序。

PeakRegressor identifies composite sequence motifs responsible for STAT1 binding sites and their potential rSNPs.

机构信息

Computational Biology Research Center, Advanced Industrial Science and Technology, Tokyo, Japan.

出版信息

PLoS One. 2010 Aug 27;5(8):e11881. doi: 10.1371/journal.pone.0011881.

DOI:10.1371/journal.pone.0011881

PMID:20806061

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2929187/

Abstract

How to identify true transcription factor binding sites on the basis of sequence motif information (e.g., motif pattern, location, combination, etc.) is an important question in bioinformatics. We present "PeakRegressor," a system that identifies binding motifs by combining DNA-sequence data and ChIP-Seq data. PeakRegressor uses L1-norm log linear regression in order to predict peak values from binding motif candidates. Our approach successfully predicts the peak values of STAT1 and RNA Polymerase II with correlation coefficients as high as 0.65 and 0.66, respectively. Using PeakRegressor, we could identify composite motifs for STAT1, as well as potential regulatory SNPs (rSNPs) involved in the regulation of transcription levels of neighboring genes. In addition, we show that among five regression methods, L1-norm log linear regression achieves the best performance with respect to binding motif identification, biological interpretability and computational efficiency.

摘要

如何根据序列基序信息（例如，基序模式、位置、组合等）识别真正的转录因子结合位点是生物信息学中的一个重要问题。我们提出了“PeakRegressor”，这是一个通过结合 DNA 序列数据和 ChIP-Seq 数据来识别结合基序的系统。PeakRegressor 使用 L1 范数对数线性回归来预测结合基序候选物的峰值。我们的方法成功地预测了 STAT1 和 RNA 聚合酶 II 的峰值，相关性系数分别高达 0.65 和 0.66。使用 PeakRegressor，我们可以为 STAT1 识别复合基序，以及参与邻近基因转录水平调节的潜在调节 SNP（rSNP）。此外，我们还表明，在五种回归方法中，L1 范数对数线性回归在结合基序识别、生物学可解释性和计算效率方面表现最佳。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47cd/2929187/edb1a5049aaf/pone.0011881.g001.jpg

相似文献

PeakRegressor identifies composite sequence motifs responsible for STAT1 binding sites and their potential rSNPs.

PLoS One. 2010 Aug 27;5(8):e11881. doi: 10.1371/journal.pone.0011881.

MER41 repeat sequences contain inducible STAT1 binding sites.

PLoS One. 2010 Jul 6;5(7):e11425. doi: 10.1371/journal.pone.0011425.

Identification of regulatory regions of bidirectional genes in cervical cancer.

BMC Med Genomics. 2013;6 Suppl 1(Suppl 1):S5. doi: 10.1186/1755-8794-6-S1-S5. Epub 2013 Jan 23.

GERV: a statistical method for generative evaluation of regulatory variants for transcription factor binding.

Bioinformatics. 2016 Feb 15;32(4):490-6. doi: 10.1093/bioinformatics/btv565. Epub 2015 Oct 17.

atSNP: transcription factor binding affinity testing for regulatory SNP detection.

Bioinformatics. 2015 Oct 15;31(20):3353-5. doi: 10.1093/bioinformatics/btv328. Epub 2015 Jun 18.

Discovering motifs in ranked lists of DNA sequences.

PLoS Comput Biol. 2007 Mar 23;3(3):e39. doi: 10.1371/journal.pcbi.0030039.

Identification of candidate regulatory SNPs by combination of transcription-factor-binding site prediction, SNP genotyping and haploChIP.

Nucleic Acids Res. 2009 Jul;37(12):e85. doi: 10.1093/nar/gkp381. Epub 2009 May 18.

Identification of human STAT5-dependent gene regulatory elements based on interspecies homology.

J Biol Chem. 2006 Sep 8;281(36):26216-24. doi: 10.1074/jbc.M605001200. Epub 2006 Jul 13.

A Panel of rSNPs Demonstrating Allelic Asymmetry in Both ChIP-seq and RNA-seq Data and the Search for Their Phenotypic Outcomes through Analysis of DEGs.

Int J Mol Sci. 2021 Jul 6;22(14):7240. doi: 10.3390/ijms22147240.

Identification of an internal cis-element essential for the human L1 transcription and a nuclear factor(s) binding to the element.

Nucleic Acids Res. 1992 Jun 25;20(12):3139-45. doi: 10.1093/nar/20.12.3139.

引用本文的文献

Discriminative motif analysis of high-throughput dataset.

Bioinformatics. 2014 Mar 15;30(6):775-83. doi: 10.1093/bioinformatics/btt615. Epub 2013 Oct 25.

本文引用的文献

Identification of candidate regulatory SNPs by combination of transcription-factor-binding site prediction, SNP genotyping and haploChIP.

Nucleic Acids Res. 2009 Jul;37(12):e85. doi: 10.1093/nar/gkp381. Epub 2009 May 18.

A primer on regression methods for decoding cis-regulatory logic.

PLoS Comput Biol. 2009 Jan;5(1):e1000269. doi: 10.1371/journal.pcbi.1000269. Epub 2009 Jan 30.

PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls.

Nat Biotechnol. 2009 Jan;27(1):66-75. doi: 10.1038/nbt.1518. Epub 2009 Jan 4.

Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing.

Nat Methods. 2007 Aug;4(8):651-7. doi: 10.1038/nmeth1068. Epub 2007 Jun 11.

Statistical mechanical modeling of genome-wide transcription factor occupancy data by MatrixREDUCE.

Bioinformatics. 2006 Jul 15;22(14):e141-9. doi: 10.1093/bioinformatics/btl223.

Defining transcriptional networks through integrative modeling of mRNA expression and transcription factor binding data.

BMC Bioinformatics. 2004 Mar 18;5:31. doi: 10.1186/1471-2105-5-31.

Integrating regulatory motif discovery and genome-wide expression analysis.

Proc Natl Acad Sci U S A. 2003 Mar 18;100(6):3339-44. doi: 10.1073/pnas.0630591100. Epub 2003 Mar 7.

The RNA polymerase II core promoter: a key component in the regulation of gene expression.

Genes Dev. 2002 Oct 15;16(20):2583-92. doi: 10.1101/gad.1026202.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

PeakRegressor 识别出负责 STAT1 结合位点及其潜在 rSNP 的复合序列基序。

PeakRegressor identifies composite sequence motifs responsible for STAT1 binding sites and their potential rSNPs.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献