dipwmsearch：一个用于搜索双 PWM 基序的 Python 包。

dipwmsearch: a Python package for searching di-PWM motifs.

机构信息

LIRMM, Univ Montpellier, CNRS, Montpellier, France.

Institut Français de Bioinformatique, CNRS UAR 3601, Évry, France.

出版信息

Bioinformatics. 2023 Apr 3;39(4). doi: 10.1093/bioinformatics/btad141.

DOI:10.1093/bioinformatics/btad141

PMID:37010504

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10081870/

Abstract

MOTIVATION

Seeking probabilistic motifs in a sequence is a common task to annotate putative transcription factor binding sites or other RNA/DNA binding sites. Useful motif representations include position weight matrices (PWMs), dinucleotide PWMs (di-PWMs), and hidden Markov models (HMMs). Dinucleotide PWMs not only combine the simplicity of PWMs-a matrix form and a cumulative scoring function-but also incorporate dependency between adjacent positions in the motif (unlike PWMs which disregard any dependency). For instance to represent binding sites, the HOCOMOCO database provides di-PWM motifs derived from experimental data. Currently, two programs, SPRy-SARUS and MOODS, can search for occurrences of di-PWMs in sequences.

RESULTS

We propose a Python package called dipwmsearch, which provides an original and efficient algorithm for this task (it first enumerates matching words for the di-PWM, and then searches these all at once in the sequence, even if the latter contains IUPAC codes). The user benefits from an easy installation via Pypi or conda, a comprehensive documentation, and executable scripts that facilitate the use of di-PWMs.

AVAILABILITY AND IMPLEMENTATION

dipwmsearch is available at https://pypi.org/project/dipwmsearch/ and https://gite.lirmm.fr/rivals/dipwmsearch/ under Cecill license.

摘要

动机

在序列中寻找概率基序是注释假定转录因子结合位点或其他 RNA/DNA 结合位点的常见任务。有用的基序表示形式包括位置权重矩阵 (PWMs)、二核苷酸 PWMs (di-PWMs) 和隐马尔可夫模型 (HMMs)。二核苷酸 PWMs 不仅结合了 PWM 的简单性——矩阵形式和累积评分函数，而且还包含基序中相邻位置之间的依赖性（与 PWM 不同，PWM 忽略任何依赖性）。例如，为了表示结合位点，HOCOMOCO 数据库提供了来自实验数据的 di-PWM 基序。目前，有两个程序，SPRy-SARUS 和 MOODS，可以在序列中搜索 di-PWM 的出现。

结果

我们提出了一个名为 dipwmsearch 的 Python 包，它为这项任务提供了一种原始而有效的算法（它首先为 di-PWM 枚举匹配的单词，然后在序列中一次性搜索这些单词，即使后者包含 IUPAC 代码）。用户可以通过 Pypi 或 conda 轻松安装，文档全面，并且可执行脚本简化了 di-PWM 的使用。