Suppr超能文献

位置权重矩阵还是非循环概率有限自动机:使用哪种模型?一种用于预测转录因子结合位点的推断决策规则。

Position Weight Matrix or Acyclic Probabilistic Finite Automaton: Which model to use? A decision rule inferred for the prediction of transcription factor binding sites.

作者信息

Lavezzo Guilherme Miura, Lauretto Marcelo de Souza, Andrioli Luiz Paulo Moura, Machado-Lima Ariane

机构信息

Universidade de São Paulo, Instituto de Matemática e Estatística, Programa Interunidades de Pós-Graduação em Bioinformática, São Paulo, SP, Brazil.

Universidade de São Paulo, Escola de Artes, Ciências e Humanidades, São Paulo, SP, Brazil.

出版信息

Genet Mol Biol. 2024 Jan 19;46(4):e20230048. doi: 10.1590/1678-4685-GMB-2023-0048. eCollection 2024.

Abstract

Prediction of transcription factor binding sites (TFBS) is an example of application of Bioinformatics where DNA molecules are represented as sequences of A, C, G and T symbols. The most used model in this problem is Position Weight Matrix (PWM). Notwithstanding the advantage of being simple, PWMs cannot capture dependency between nucleotide positions, which may affect prediction performance. Acyclic Probabilistic Finite Automata (APFA) is an alternative model able to accommodate position dependencies. However, APFA is a more complex model, which means more parameters have to be learned. In this paper, we propose an innovative method to identify when position dependencies influence preference for PWMs or APFAs. This implied using position dependency features extracted from 1106 sets of TFBS to infer a decision tree able to predict which is the best model - PWM or APFA - for a given set of TFBSs. According to our results, as few as three pinpointed features are able to choose the best model, providing a balance of performance (average precision) and model simplicity.

摘要

转录因子结合位点(TFBS)的预测是生物信息学应用的一个例子,其中DNA分子被表示为A、C、G和T符号的序列。该问题中最常用的模型是位置权重矩阵(PWM)。尽管PWM具有简单的优点,但它无法捕捉核苷酸位置之间的依赖性,这可能会影响预测性能。无环概率有限自动机(APFA)是一种能够适应位置依赖性的替代模型。然而,APFA是一个更复杂的模型,这意味着必须学习更多的参数。在本文中,我们提出了一种创新方法,以确定位置依赖性何时会影响对PWM或APFA的偏好。这意味着使用从1106组TFBS中提取的位置依赖性特征来推断一棵决策树,该决策树能够预测对于给定的一组TFBS,哪个是最佳模型——PWM还是APFA。根据我们的结果,仅三个精确特征就能选择最佳模型,从而在性能(平均精度)和模型简单性之间取得平衡。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验