• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用集成随机森林模型预测转录因子结合

Predicting transcription factor binding using ensemble random forest models.

作者信息

Behjati Ardakani Fatemeh, Schmidt Florian, Schulz Marcel H

机构信息

High throughput Genomics and Systems Biology, Cluster of Excellence on Multimodel Computing and Interaction, Saarland University, Saarbruecken,, Saarland, 66123, Germany.

Computational Biology and Applied Algorithmics, Max Planck Institute for Informatics, Saarbruecken, Saarland, 66123, Germany.

出版信息

F1000Res. 2018 Oct 4;7:1603. doi: 10.12688/f1000research.16200.2. eCollection 2018.

DOI:10.12688/f1000research.16200.2
PMID:31723409
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6823902/
Abstract

: Understanding the location and cell-type specific binding of Transcription Factors (TFs) is important in the study of gene regulation. Computational prediction of TF binding sites is challenging, because TFs often bind only to short DNA motifs and cell-type specific co-factors may work together with the same TF to determine binding. Here, we consider the problem of learning a general model for the prediction of TF binding using DNase1-seq data and TF motif description in form of position specific energy matrices (PSEMs). We use TF ChIP-seq data as a gold-standard for model training and evaluation. Our contribution is a novel ensemble learning approach using random forest classifiers. In the context of the we consider different learning setups. Our results indicate that the ensemble learning approach is able to better generalize across tissues and cell-types compared to individual tissue-specific classifiers or a classifier built based upon data aggregated across tissues. Furthermore, we show that incorporating DNase1-seq peaks is essential to reduce the false positive rate of TF binding predictions compared to considering the raw DNase1 signal. Analysis of important features reveals that the models preferentially select motifs of other TFs that are close interaction partners in existing protein protein-interaction networks. Code generated in the scope of this project is available on GitHub: https://github.com/SchulzLab/TFAnalysis (DOI: 10.5281/zenodo.1409697).

摘要

了解转录因子(TFs)的定位和细胞类型特异性结合在基因调控研究中很重要。TF结合位点的计算预测具有挑战性,因为TF通常仅与短DNA基序结合,并且细胞类型特异性辅助因子可能与同一TF协同作用以确定结合。在这里,我们考虑使用DNase1-seq数据和以位置特异性能量矩阵(PSEM)形式的TF基序描述来学习用于预测TF结合的通用模型的问题。我们使用TF ChIP-seq数据作为模型训练和评估的金标准。我们的贡献是一种使用随机森林分类器的新型集成学习方法。在本文的背景下,我们考虑了不同的学习设置。我们的结果表明,与单个组织特异性分类器或基于跨组织聚合数据构建的分类器相比,集成学习方法能够更好地在不同组织和细胞类型中进行泛化。此外,我们表明,与考虑原始DNase1信号相比,纳入DNase1-seq峰对于降低TF结合预测的假阳性率至关重要。对重要特征的分析表明,模型优先选择在现有蛋白质-蛋白质相互作用网络中作为紧密相互作用伙伴的其他TF的基序。本项目范围内生成的代码可在GitHub上获取:https://github.com/SchulzLab/TFAnalysis(DOI:10.5281/zenodo.1409697)。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b302/6823906/8d93492b20a7/f1000research-7-22091-g0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b302/6823906/e62a83607ea5/f1000research-7-22091-g0000.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b302/6823906/744198a97aab/f1000research-7-22091-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b302/6823906/70a1f249f3aa/f1000research-7-22091-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b302/6823906/1551bdd292e5/f1000research-7-22091-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b302/6823906/677541bbec92/f1000research-7-22091-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b302/6823906/95d204b51303/f1000research-7-22091-g0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b302/6823906/8d93492b20a7/f1000research-7-22091-g0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b302/6823906/e62a83607ea5/f1000research-7-22091-g0000.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b302/6823906/744198a97aab/f1000research-7-22091-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b302/6823906/70a1f249f3aa/f1000research-7-22091-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b302/6823906/1551bdd292e5/f1000research-7-22091-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b302/6823906/677541bbec92/f1000research-7-22091-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b302/6823906/95d204b51303/f1000research-7-22091-g0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b302/6823906/8d93492b20a7/f1000research-7-22091-g0006.jpg

相似文献

1
Predicting transcription factor binding using ensemble random forest models.使用集成随机森林模型预测转录因子结合
F1000Res. 2018 Oct 4;7:1603. doi: 10.12688/f1000research.16200.2. eCollection 2018.
2
Assessing the model transferability for prediction of transcription factor binding sites based on chromatin accessibility.基于染色质可及性评估预测转录因子结合位点的模型可转移性。
BMC Bioinformatics. 2017 Jul 27;18(1):355. doi: 10.1186/s12859-017-1769-7.
3
Improved linking of motifs to their TFs using domain information.利用域信息改进基序与其 TF 的关联。
Bioinformatics. 2020 Mar 1;36(6):1655-1662. doi: 10.1093/bioinformatics/btz855.
4
Contribution of Sequence Motif, Chromatin State, and DNA Structure Features to Predictive Models of Transcription Factor Binding in Yeast.序列基序、染色质状态和DNA结构特征对酵母转录因子结合预测模型的贡献
PLoS Comput Biol. 2015 Aug 20;11(8):e1004418. doi: 10.1371/journal.pcbi.1004418. eCollection 2015 Aug.
5
BinDNase: a discriminatory approach for transcription factor binding prediction using DNase I hypersensitivity data.BinDNase:一种利用DNA酶I超敏反应数据进行转录因子结合预测的鉴别方法。
Bioinformatics. 2015 Sep 1;31(17):2852-9. doi: 10.1093/bioinformatics/btv294. Epub 2015 May 7.
6
Discovery of cell-type specific DNA motif grammar in cis-regulatory elements using random Forest.使用随机森林在顺式调控元件中发现细胞类型特异性 DNA 基元语法。
BMC Genomics. 2018 Jan 19;19(Suppl 1):929. doi: 10.1186/s12864-017-4340-z.
7
On the problem of confounders in modeling gene expression.关于基因表达建模中混杂因素的问题。
Bioinformatics. 2019 Feb 15;35(4):711-719. doi: 10.1093/bioinformatics/bty674.
8
MOCCS: Clarifying DNA-binding motif ambiguity using ChIP-Seq data.MOCCS:利用染色质免疫沉淀测序(ChIP-Seq)数据澄清DNA结合基序的模糊性。
Comput Biol Chem. 2016 Aug;63:62-72. doi: 10.1016/j.compbiolchem.2016.01.014. Epub 2016 Feb 13.
9
The next generation of transcription factor binding site prediction.下一代转录因子结合位点预测。
PLoS Comput Biol. 2013;9(9):e1003214. doi: 10.1371/journal.pcbi.1003214. Epub 2013 Sep 5.
10
Virtual ChIP-seq: predicting transcription factor binding by learning from the transcriptome.虚拟 ChIP-seq:通过学习转录组预测转录因子结合。
Genome Biol. 2022 Jun 10;23(1):126. doi: 10.1186/s13059-022-02690-2.

引用本文的文献

1
Predicting CTCF cell type active binding sites in human genome.预测人类基因组中CTCF细胞类型活性结合位点
Sci Rep. 2024 Dec 30;14(1):31744. doi: 10.1038/s41598-024-82238-5.
2
From genotype to phenotype: computational approaches for inferring microbial traits relevant to the food industry.从基因型到表型:推断与食品工业相关的微生物特性的计算方法。
FEMS Microbiol Rev. 2023 Jul 5;47(4). doi: 10.1093/femsre/fuad030.
3
Benefiting from the intrinsic role of epigenetics to predict patterns of CTCF binding.受益于表观遗传学在预测CTCF结合模式方面的内在作用。
Comput Struct Biotechnol J. 2023 May 12;21:3024-3031. doi: 10.1016/j.csbj.2023.05.012. eCollection 2023.
4
Computational approaches to understand transcription regulation in development.计算方法在发育中理解转录调控。
Biochem Soc Trans. 2023 Feb 27;51(1):1-12. doi: 10.1042/BST20210145.
5
Modeling binding specificities of transcription factor pairs with random forests.用随机森林模型模拟转录因子对的结合特异性。
BMC Bioinformatics. 2022 Jun 3;23(1):212. doi: 10.1186/s12859-022-04734-7.
6
Protein-Protein Interactions Efficiently Modeled by Residue Cluster Classes.残基簇类高效模拟蛋白质-蛋白质相互作用。
Int J Mol Sci. 2020 Jul 6;21(13):4787. doi: 10.3390/ijms21134787.