位置权重矩阵还是非循环概率有限自动机：使用哪种模型？一种用于预测转录因子结合位点的推断决策规则。

Position Weight Matrix or Acyclic Probabilistic Finite Automaton: Which model to use? A decision rule inferred for the prediction of transcription factor binding sites.

作者信息

Lavezzo Guilherme Miura, Lauretto Marcelo de Souza, Andrioli Luiz Paulo Moura, Machado-Lima Ariane

机构信息

Universidade de São Paulo, Instituto de Matemática e Estatística, Programa Interunidades de Pós-Graduação em Bioinformática, São Paulo, SP, Brazil.

Universidade de São Paulo, Escola de Artes, Ciências e Humanidades, São Paulo, SP, Brazil.

出版信息

Genet Mol Biol. 2024 Jan 19;46(4):e20230048. doi: 10.1590/1678-4685-GMB-2023-0048. eCollection 2024.

DOI:10.1590/1678-4685-GMB-2023-0048

PMID:38285430

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10945726/

Abstract

Prediction of transcription factor binding sites (TFBS) is an example of application of Bioinformatics where DNA molecules are represented as sequences of A, C, G and T symbols. The most used model in this problem is Position Weight Matrix (PWM). Notwithstanding the advantage of being simple, PWMs cannot capture dependency between nucleotide positions, which may affect prediction performance. Acyclic Probabilistic Finite Automata (APFA) is an alternative model able to accommodate position dependencies. However, APFA is a more complex model, which means more parameters have to be learned. In this paper, we propose an innovative method to identify when position dependencies influence preference for PWMs or APFAs. This implied using position dependency features extracted from 1106 sets of TFBS to infer a decision tree able to predict which is the best model - PWM or APFA - for a given set of TFBSs. According to our results, as few as three pinpointed features are able to choose the best model, providing a balance of performance (average precision) and model simplicity.

摘要

转录因子结合位点（TFBS）的预测是生物信息学应用的一个例子，其中DNA分子被表示为A、C、G和T符号的序列。该问题中最常用的模型是位置权重矩阵（PWM）。尽管PWM具有简单的优点，但它无法捕捉核苷酸位置之间的依赖性，这可能会影响预测性能。无环概率有限自动机（APFA）是一种能够适应位置依赖性的替代模型。然而，APFA是一个更复杂的模型，这意味着必须学习更多的参数。在本文中，我们提出了一种创新方法，以确定位置依赖性何时会影响对PWM或APFA的偏好。这意味着使用从1106组TFBS中提取的位置依赖性特征来推断一棵决策树，该决策树能够预测对于给定的一组TFBS，哪个是最佳模型——PWM还是APFA。根据我们的结果，仅三个精确特征就能选择最佳模型，从而在性能（平均精度）和模型简单性之间取得平衡。

相似文献

Position Weight Matrix or Acyclic Probabilistic Finite Automaton: Which model to use? A decision rule inferred for the prediction of transcription factor binding sites.

Genet Mol Biol. 2024 Jan 19;46(4):e20230048. doi: 10.1590/1678-4685-GMB-2023-0048. eCollection 2024.

Jaccard index based similarity measure to compare transcription factor binding site models.

Algorithms Mol Biol. 2013 Sep 30;8(1):23. doi: 10.1186/1748-7188-8-23.

Transcription factor binding sites prediction based on modified nucleosomes.

PLoS One. 2014 Feb 21;9(2):e89226. doi: 10.1371/journal.pone.0089226. eCollection 2014.

Evaluating tools for transcription factor binding site prediction.

BMC Bioinformatics. 2016 Nov 2;17(1):547. doi: 10.1186/s12859-016-1298-9.

Tree-based position weight matrix approach to model transcription factor binding site profiles.

PLoS One. 2011;6(9):e24210. doi: 10.1371/journal.pone.0024210. Epub 2011 Sep 2.

A DNA shape-based regulatory score improves position-weight matrix-based recognition of transcription factor binding sites.

Bioinformatics. 2015 Nov 1;31(21):3445-50. doi: 10.1093/bioinformatics/btv391. Epub 2015 Jun 30.

A general pairwise interaction model provides an accurate description of in vivo transcription factor binding sites.

PLoS One. 2014 Jun 13;9(6):e99015. doi: 10.1371/journal.pone.0099015. eCollection 2014.

Effective transcription factor binding site prediction using a combination of optimization, a genetic algorithm and discriminant analysis to capture distant interactions.

BMC Bioinformatics. 2007 Dec 19;8:481. doi: 10.1186/1471-2105-8-481.

CSI-Tree: a regression tree approach for modeling binding properties of DNA-binding molecules based on cognate site identification (CSI) data.

Nucleic Acids Res. 2008 Jun;36(10):3171-84. doi: 10.1093/nar/gkn057. Epub 2008 Apr 13.

Assessment of clusters of transcription factor binding sites in relationship to human promoter, CpG islands and gene expression.

BMC Genomics. 2004 Feb 23;5(1):16. doi: 10.1186/1471-2164-5-16.

本文引用的文献

STREME: accurate and versatile sequence motif discovery.

Bioinformatics. 2021 Sep 29;37(18):2834-2840. doi: 10.1093/bioinformatics/btab203.

Transcription Factor Binding Affinities and DNA Shape Readout.

iScience. 2020 Oct 15;23(11):101694. doi: 10.1016/j.isci.2020.101694. eCollection 2020 Nov 20.

Methods for ChIP-seq analysis: A practical workflow and advanced applications.

Methods. 2021 Mar;187:44-53. doi: 10.1016/j.ymeth.2020.03.005. Epub 2020 Mar 30.

Determinants of enhancer and promoter activities of regulatory elements.

Nat Rev Genet. 2020 Feb;21(2):71-87. doi: 10.1038/s41576-019-0173-8. Epub 2019 Oct 11.

Developmental enhancers and chromosome topology.

Science. 2018 Sep 28;361(6409):1341-1345. doi: 10.1126/science.aau0320.

Disentangling transcription factor binding site complexity.

Nucleic Acids Res. 2018 Nov 16;46(20):e121. doi: 10.1093/nar/gky683.

Direct reprogramming of fibroblasts into neural stem cells by single non-neural progenitor transcription factor Ptf1a.

Nat Commun. 2018 Jul 20;9(1):2865. doi: 10.1038/s41467-018-05209-1.

RSAT 2018: regulatory sequence analysis tools 20th anniversary.

Nucleic Acids Res. 2018 Jul 2;46(W1):W209-W214. doi: 10.1093/nar/gky317.

The Human Transcription Factors.

Cell. 2018 Feb 8;172(4):650-665. doi: 10.1016/j.cell.2018.01.029.

Statistical notes for clinical researchers: Chi-squared test and Fisher's exact test.

Restor Dent Endod. 2017 May;42(2):152-155. doi: 10.5395/rde.2017.42.2.152. Epub 2017 Mar 30.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

位置权重矩阵还是非循环概率有限自动机：使用哪种模型？一种用于预测转录因子结合位点的推断决策规则。

Position Weight Matrix or Acyclic Probabilistic Finite Automaton: Which model to use? A decision rule inferred for the prediction of transcription factor binding sites.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献