Bioinformatics Unit, IRCCS Casa Sollievo della Sofferenza, S. Giovanni Rotondo 71013, Italy.
Department of Experimental Medicine, Sapienza University of Rome, Rome 00161, Italy.
Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab303.
Hundreds of human proteins were found to establish transient interactions with rather degenerated consensus DNA sequences or motifs. Identifying these motifs and the genomic sites where interactions occur represent one of the most challenging research goals in modern molecular biology and bioinformatics. The last twenty years witnessed an explosion of computational tools designed to perform this task, whose performance has been last compared fifteen years ago. Here, we survey sixteen of them, benchmark their ability to identify known motifs nested in twenty-nine simulated sequence datasets, and finally report their strengths, weaknesses, and complementarity.
数百个人类蛋白质被发现与相当退化的共识 DNA 序列或基序建立瞬时相互作用。识别这些基序和发生相互作用的基因组位点是现代分子生物学和生物信息学中最具挑战性的研究目标之一。在过去的二十年里,设计用于执行此任务的计算工具呈爆炸式增长,其性能在十五年前进行了最后一次比较。在这里,我们调查了其中的十六个,基准测试它们识别嵌套在二十九个模拟序列数据集中的已知基序的能力,最后报告它们的优点、缺点和互补性。