Suppr超能文献

ProSampler:一种在大型 ChIP-seq 数据集中用于组合基序发现的超快速和准确的基序查找器。

ProSampler: an ultrafast and accurate motif finder in large ChIP-seq datasets for combinatory motif discovery.

机构信息

School of Mathematics, Shandong University, Jinan 250100, China.

Department of Bioinformatics and Genomics, College of Computing and Informatics, The University of North Carolina at Charlotte, Charlotte, NC 28223, USA.

出版信息

Bioinformatics. 2019 Nov 1;35(22):4632-4639. doi: 10.1093/bioinformatics/btz290.

Abstract

MOTIVATION

The availability of numerous ChIP-seq datasets for transcription factors (TF) has provided an unprecedented opportunity to identify all TF binding sites in genomes. However, the progress has been hindered by the lack of a highly efficient and accurate tool to find not only the target motifs, but also cooperative motifs in very big datasets.

RESULTS

We herein present an ultrafast and accurate motif-finding algorithm, ProSampler, based on a novel numeration method and Gibbs sampler. ProSampler runs orders of magnitude faster than the fastest existing tools while often more accurately identifying motifs of both the target TFs and cooperators. Thus, ProSampler can greatly facilitate the efforts to identify the entire cis-regulatory code in genomes.

AVAILABILITY AND IMPLEMENTATION

Source code and binaries are freely available for download at https://github.com/zhengchangsulab/prosampler. It was implemented in C++ and supported on Linux, macOS and MS Windows platforms.

SUPPLEMENTARY INFORMATION

Supplementary materials are available at Bioinformatics online.

摘要

动机

大量转录因子(TF)的 ChIP-seq 数据集的出现,为鉴定基因组中所有 TF 结合位点提供了前所未有的机会。然而,由于缺乏一种高效、准确的工具,不仅难以在非常大的数据集里找到目标基序,也难以找到协同基序,该研究进展受到了阻碍。

结果

本文提出了一种超快、超准的基序发现算法 ProSampler,该算法基于一种新颖的编号方法和吉布斯采样器。ProSampler 的运行速度比现有的最快工具快几个数量级,而且通常更准确地识别目标 TF 和协同因子的基序。因此,ProSampler 可以极大地促进鉴定基因组中整个顺式调控代码的工作。

可用性和实现

源代码和二进制文件可在 https://github.com/zhengchangsulab/prosampler 上免费下载。它是用 C++编写的,支持 Linux、macOS 和 MS Windows 平台。

补充信息

补充材料可在 Bioinformatics 在线获取。

相似文献

2
STREME: accurate and versatile sequence motif discovery.STREME:准确且通用的序列基序发现。
Bioinformatics. 2021 Sep 29;37(18):2834-2840. doi: 10.1093/bioinformatics/btab203.
4
Improved linking of motifs to their TFs using domain information.利用域信息改进基序与其 TF 的关联。
Bioinformatics. 2020 Mar 1;36(6):1655-1662. doi: 10.1093/bioinformatics/btz855.
5
Set cover-based methods for motif selection.基于集合覆盖的 motif 选择方法。
Bioinformatics. 2020 Feb 15;36(4):1044-1051. doi: 10.1093/bioinformatics/btz697.
8
MEME-ChIP: motif analysis of large DNA datasets.MEME-ChIP:大 DNA 数据集的基序分析。
Bioinformatics. 2011 Jun 15;27(12):1696-7. doi: 10.1093/bioinformatics/btr189. Epub 2011 Apr 12.

引用本文的文献

4
BIOMAPP::CHIP: large-scale motif analysis.BIOMAPP::CHIP:大规模基序分析。
BMC Bioinformatics. 2024 Mar 26;25(1):128. doi: 10.1186/s12859-024-05752-3.
7
MIBPred: Ensemble Learning-Based Metal Ion-Binding Protein Classifier.MIBPred:基于集成学习的金属离子结合蛋白分类器。
ACS Omega. 2024 Feb 8;9(7):8439-8447. doi: 10.1021/acsomega.3c09587. eCollection 2024 Feb 20.

本文引用的文献

3
The Genetics of Transcription Factor DNA Binding Variation.转录因子DNA结合变异的遗传学
Cell. 2016 Jul 28;166(3):538-554. doi: 10.1016/j.cell.2016.07.012.
5
FastMotif: spectral sequence motif discovery.FastMotif:光谱序列基序发现
Bioinformatics. 2015 Aug 15;31(16):2623-31. doi: 10.1093/bioinformatics/btv208. Epub 2015 Apr 16.
8
EXTREME: an online EM algorithm for motif discovery.极端:一种用于基序发现的在线 EM 算法。
Bioinformatics. 2014 Jun 15;30(12):1667-73. doi: 10.1093/bioinformatics/btu093. Epub 2014 Feb 14.
10
Discriminative motif analysis of high-throughput dataset.高通量数据集的判别基序分析。
Bioinformatics. 2014 Mar 15;30(6):775-83. doi: 10.1093/bioinformatics/btt615. Epub 2013 Oct 25.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验