• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用关联规则挖掘发现蛋白质-DNA 结合序列模式。

Discovering protein-DNA binding sequence patterns using association rule mining.

机构信息

Department of Computer Science & Engineering, The Chinese University of Hong Kong, and Hong Kong Bioinformatics Centre, Shatin, NT, Hong Kong, China.

出版信息

Nucleic Acids Res. 2010 Oct;38(19):6324-37. doi: 10.1093/nar/gkq500. Epub 2010 Jun 6.

DOI:10.1093/nar/gkq500
PMID:20529874
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2965231/
Abstract

Protein-DNA bindings between transcription factors (TFs) and transcription factor binding sites (TFBSs) play an essential role in transcriptional regulation. Over the past decades, significant efforts have been made to study the principles for protein-DNA bindings. However, it is considered that there are no simple one-to-one rules between amino acids and nucleotides. Many methods impose complicated features beyond sequence patterns. Protein-DNA bindings are formed from associated amino acid and nucleotide sequence pairs, which determine many functional characteristics. Therefore, it is desirable to investigate associated sequence patterns between TFs and TFBSs. With increasing computational power, availability of massive experimental databases on DNA and proteins, and mature data mining techniques, we propose a framework to discover associated TF-TFBS binding sequence patterns in the most explicit and interpretable form from TRANSFAC. The framework is based on association rule mining with Apriori algorithm. The patterns found are evaluated by quantitative measurements at several levels on TRANSFAC. With further independent verifications from literatures, Protein Data Bank and homology modeling, there are strong evidences that the patterns discovered reveal real TF-TFBS bindings across different TFs and TFBSs, which can drive for further knowledge to better understand TF-TFBS bindings.

摘要

转录因子(TFs)与转录因子结合位点(TFBSs)之间的蛋白-DNA 结合对于转录调控起着至关重要的作用。在过去的几十年中,人们已经做出了巨大的努力来研究蛋白-DNA 结合的原理。然而,人们认为氨基酸和核苷酸之间没有简单的一一对应规则。许多方法引入了复杂的特征,而不仅仅是序列模式。蛋白-DNA 结合是由相关的氨基酸和核苷酸序列对形成的,这些序列对决定了许多功能特征。因此,研究 TF 和 TFBS 之间的相关序列模式是很有必要的。随着计算能力的提高、大量 DNA 和蛋白质实验数据库的可用性以及成熟的数据挖掘技术的出现,我们提出了一个从 TRANSFAC 中以最明确和可解释的形式发现相关 TF-TFBS 结合序列模式的框架。该框架基于关联规则挖掘和 Apriori 算法。所发现的模式通过在 TRANSFAC 上进行几个层次的定量测量进行评估。通过进一步从文献、蛋白质数据库和同源建模中进行独立验证,有强有力的证据表明,所发现的模式揭示了不同 TF 和 TFBS 之间的真实 TF-TFBS 结合,这可以进一步深入了解 TF-TFBS 结合。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a1e/2965231/fc2d8578ecb1/gkq500f8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a1e/2965231/6beea352285e/gkq500f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a1e/2965231/368f789d1734/gkq500f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a1e/2965231/a22582cf9ee0/gkq500f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a1e/2965231/c3efc3ff73dd/gkq500f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a1e/2965231/71edf9d2f2a5/gkq500f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a1e/2965231/8491f189c424/gkq500f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a1e/2965231/8f4b01a008b2/gkq500f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a1e/2965231/fc2d8578ecb1/gkq500f8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a1e/2965231/6beea352285e/gkq500f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a1e/2965231/368f789d1734/gkq500f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a1e/2965231/a22582cf9ee0/gkq500f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a1e/2965231/c3efc3ff73dd/gkq500f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a1e/2965231/71edf9d2f2a5/gkq500f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a1e/2965231/8491f189c424/gkq500f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a1e/2965231/8f4b01a008b2/gkq500f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a1e/2965231/fc2d8578ecb1/gkq500f8.jpg

相似文献

1
Discovering protein-DNA binding sequence patterns using association rule mining.利用关联规则挖掘发现蛋白质-DNA 结合序列模式。
Nucleic Acids Res. 2010 Oct;38(19):6324-37. doi: 10.1093/nar/gkq500. Epub 2010 Jun 6.
2
Discovering approximate-associated sequence patterns for protein-DNA interactions.发现蛋白质与 DNA 相互作用的近似相关序列模式。
Bioinformatics. 2011 Feb 15;27(4):471-8. doi: 10.1093/bioinformatics/btq682. Epub 2010 Dec 30.
3
LASAGNA: a novel algorithm for transcription factor binding site alignment.LASAGNA:一种用于转录因子结合位点比对的新算法。
BMC Bioinformatics. 2013 Mar 24;14:108. doi: 10.1186/1471-2105-14-108.
4
Subtypes of associated protein-DNA (Transcription Factor-Transcription Factor Binding Site) patterns.相关蛋白-DNA(转录因子-转录因子结合位点)模式的亚型。
Nucleic Acids Res. 2012 Oct;40(19):9392-403. doi: 10.1093/nar/gks749. Epub 2012 Aug 16.
5
Molecular and structural considerations of TF-DNA binding for the generation of biologically meaningful and accurate phylogenetic footprinting analysis: the LysR-type transcriptional regulator family as a study model.用于生成具有生物学意义和准确的系统发育足迹分析的TF-DNA结合的分子和结构考量:以LysR型转录调节因子家族作为研究模型
BMC Genomics. 2016 Aug 27;17(1):686. doi: 10.1186/s12864-016-3025-3.
6
Discovering Binding Cores in Protein-DNA Binding Using Association Rule Mining with Statistical Measures.利用带有统计量度的关联规则挖掘在蛋白质 - DNA 结合中发现结合核心。
IEEE/ACM Trans Comput Biol Bioinform. 2015 Jan-Feb;12(1):142-54. doi: 10.1109/TCBB.2014.2343952.
7
Most of the tight positional conservation of transcription factor binding sites near the transcription start site reflects their co-localization within regulatory modules.转录起始位点附近转录因子结合位点的大多数紧密位置保守性反映了它们在调控模块内的共定位。
BMC Bioinformatics. 2016 Nov 21;17(1):479. doi: 10.1186/s12859-016-1354-5.
8
Modeling associated protein-DNA pattern discovery with unified scores.使用统一分数进行关联蛋白质-DNA模式发现建模。
IEEE/ACM Trans Comput Biol Bioinform. 2013 May-Jun;10(3):696-707. doi: 10.1109/TCBB.2013.60.
9
Discovering Protein-DNA Binding Cores by Aligned Pattern Clustering.通过对齐模式聚类发现蛋白质-DNA结合核心
IEEE/ACM Trans Comput Biol Bioinform. 2017 Mar-Apr;14(2):254-263. doi: 10.1109/TCBB.2015.2474376. Epub 2015 Aug 28.
10
An intuitionistic approach to scoring DNA sequences against transcription factor binding site motifs.一种基于直觉的方法,用于对 DNA 序列进行评分,以对抗转录因子结合位点基序。
BMC Bioinformatics. 2010 Nov 8;11:551. doi: 10.1186/1471-2105-11-551.

引用本文的文献

1
mtDNA Single-Nucleotide Variants Associated with Type 2 Diabetes.与2型糖尿病相关的线粒体DNA单核苷酸变异
Curr Issues Mol Biol. 2023 Oct 30;45(11):8716-8732. doi: 10.3390/cimb45110548.
2
Imbalanced target prediction with pattern discovery on clinical data repositories.基于临床数据存储库的模式发现进行不平衡目标预测。
BMC Med Inform Decis Mak. 2017 Apr 20;17(1):47. doi: 10.1186/s12911-017-0443-3.
3
Novel approach for identification of influenza virus host range and zoonotic transmissible sequences by determination of host-related associative positions in viral genome segments.

本文引用的文献

1
Structure of the intact PPAR-gamma-RXR- nuclear receptor complex on DNA.完整的PPAR-γ-RXR核受体复合物在DNA上的结构。
Nature. 2008 Nov 20;456(7220):350-6. doi: 10.1038/nature07413.
2
Protein-DNA interactions: structural, thermodynamic and clustering patterns of conserved residues in DNA-binding proteins.蛋白质与DNA的相互作用:DNA结合蛋白中保守残基的结构、热力学及聚类模式
Nucleic Acids Res. 2008 Oct;36(18):5922-32. doi: 10.1093/nar/gkn573. Epub 2008 Sep 18.
3
Structural basis for DNA recognition by FoxO1 and its regulation by posttranslational modification.
通过确定病毒基因组片段中与宿主相关的关联位置来鉴定流感病毒宿主范围和人畜共患传播序列的新方法。
BMC Genomics. 2016 Nov 16;17(1):925. doi: 10.1186/s12864-016-3250-9.
4
icuARM-II: improving the reliability of personalized risk prediction in pediatric intensive care units.重症监护病房风险预测模型-II:提高儿科重症监护病房个性化风险预测的可靠性
ACM BCB. 2014 Sep;2014:211-219. doi: 10.1145/2649387.2649440.
5
Computational learning on specificity-determining residue-nucleotide interactions.关于特异性决定残基-核苷酸相互作用的计算学习
Nucleic Acids Res. 2015 Dec 2;43(21):10180-9. doi: 10.1093/nar/gkv1134. Epub 2015 Nov 2.
6
A primer to frequent itemset mining for bioinformatics.生物信息学频繁项集挖掘入门
Brief Bioinform. 2015 Mar;16(2):216-31. doi: 10.1093/bib/bbt074. Epub 2013 Oct 26.
7
DNA motif elucidation using belief propagation.利用信念传播阐明 DNA 基序。
Nucleic Acids Res. 2013 Sep;41(16):e153. doi: 10.1093/nar/gkt574. Epub 2013 Jun 29.
8
Discovering associations in biomedical datasets by link-based associative classifier (LAC).基于链接的关联分类器(LAC)发现生物医学数据集的关联。
PLoS One. 2012;7(12):e51018. doi: 10.1371/journal.pone.0051018. Epub 2012 Dec 5.
9
Fast rule-based bioactivity prediction using associative classification mining.基于关联分类挖掘的快速规则生物活性预测。
J Cheminform. 2012 Nov 23;4(1):29. doi: 10.1186/1758-2946-4-29.
10
Subtypes of associated protein-DNA (Transcription Factor-Transcription Factor Binding Site) patterns.相关蛋白-DNA(转录因子-转录因子结合位点)模式的亚型。
Nucleic Acids Res. 2012 Oct;40(19):9392-403. doi: 10.1093/nar/gks749. Epub 2012 Aug 16.
FoxO1识别DNA的结构基础及其翻译后修饰调控
Structure. 2008 Sep 10;16(9):1407-16. doi: 10.1016/j.str.2008.06.013.
4
Crystal structures of multiple GATA zinc fingers bound to DNA reveal new insights into DNA recognition and self-association by GATA.多个与DNA结合的GATA锌指的晶体结构揭示了GATA对DNA识别和自我缔合的新见解。
J Mol Biol. 2008 Sep 19;381(5):1292-306. doi: 10.1016/j.jmb.2008.06.072. Epub 2008 Jul 2.
5
Regulation of the transcription factor Ets-1 by DNA-mediated homo-dimerization.通过DNA介导的同源二聚化对转录因子Ets-1进行调控。
EMBO J. 2008 Jul 23;27(14):2006-17. doi: 10.1038/emboj.2008.117. Epub 2008 Jun 19.
6
Extracting sequence features to predict protein-DNA interactions: a comparative study.提取序列特征以预测蛋白质 - DNA 相互作用:一项比较研究。
Nucleic Acids Res. 2008 Jul;36(12):4137-48. doi: 10.1093/nar/gkn361. Epub 2008 Jun 13.
7
DBD--taxonomically broad transcription factor predictions: new content and functionality.DBD——分类学范围广泛的转录因子预测:新内容与功能
Nucleic Acids Res. 2008 Jan;36(Database issue):D88-92. doi: 10.1093/nar/gkm964. Epub 2007 Dec 11.
8
The 20 years of PROSITE.PROSITE的二十年。
Nucleic Acids Res. 2008 Jan;36(Database issue):D245-9. doi: 10.1093/nar/gkm977. Epub 2007 Nov 14.
9
Prediction of DNA-binding residues from sequence.从序列预测DNA结合残基。
Bioinformatics. 2007 Jul 1;23(13):i347-53. doi: 10.1093/bioinformatics/btm174.
10
Improved benchmarks for computational motif discovery.用于计算基序发现的改进基准。
BMC Bioinformatics. 2007 Jun 8;8:193. doi: 10.1186/1471-2105-8-193.