DiNAMO：高通量测序数据中高度敏感的 DNA 基序发现。

DiNAMO: highly sensitive DNA motif discovery in high-throughput sequencing data.

机构信息

Univ. Lille, CNRS, Inria, UMR 9189 - CRIStAL - Centre de Recherche en Informatique Signal et Automatique de Lille, Lille, France.

Univ. Lille, Inserm, Lille University Hospital, UMR-S 1172 - JPARC - Centre de Recherche Jean-Pierre AUBERT, Lille, F-59000, France.

出版信息

BMC Bioinformatics. 2018 Jun 11;19(1):223. doi: 10.1186/s12859-018-2215-1.

DOI:10.1186/s12859-018-2215-1

PMID:29890948

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5996464/

Abstract

BACKGROUND

Discovering over-represented approximate motifs in DNA sequences is an essential part of bioinformatics. This topic has been studied extensively because of the increasing number of potential applications. However, it remains a difficult challenge, especially with the huge quantity of data generated by high throughput sequencing technologies. To overcome this problem, existing tools use greedy algorithms and probabilistic approaches to find motifs in reasonable time. Nevertheless these approaches lack sensitivity and have difficulties coping with rare and subtle motifs.

RESULTS

We developed DiNAMO (for DNA MOtif), a new software based on an exhaustive and efficient algorithm for IUPAC motif discovery. We evaluated DiNAMO on synthetic and real datasets with two different applications, namely ChIP-seq peaks and Systematic Sequencing Error analysis. DiNAMO proves to compare favorably with other existing methods and is robust to noise.

CONCLUSIONS

We shown that DiNAMO software can serve as a tool to search for degenerate motifs in an exact manner using IUPAC models. DiNAMO can be used in scanning mode with sliding windows or in fixed position mode, which makes it suitable for numerous potential applications.

AVAILABILITY

https://github.com/bonsai-team/DiNAMO .

摘要

背景

在 DNA 序列中发现过度表示的近似基序是生物信息学的重要组成部分。由于潜在应用的数量不断增加，这个主题已经得到了广泛的研究。然而，这仍然是一个具有挑战性的问题，特别是对于高通量测序技术产生的大量数据。为了解决这个问题，现有的工具使用贪婪算法和概率方法在合理的时间内找到基序。然而，这些方法缺乏敏感性，并且难以应对罕见和微妙的基序。

结果

我们开发了 DiNAMO（用于 DNA 基序），这是一种基于 IUPAC 基序发现的穷举和高效算法的新软件。我们使用两种不同的应用程序（即 ChIP-seq 峰和系统测序错误分析）在合成和真实数据集上评估了 DiNAMO。DiNAMO 被证明与其他现有方法相比具有优势，并且对噪声具有鲁棒性。

结论

我们表明，DiNAMO 软件可以用作使用 IUPAC 模型以精确方式搜索简并基序的工具。DiNAMO 可以在滑动窗口的扫描模式或固定位置模式下使用，这使其适用于许多潜在的应用。

可用性

https://github.com/bonsai-team/DiNAMO。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6565/5996464/c3aa52262b92/12859_2018_2215_Fig1_HTML.jpg

相似文献

DiNAMO: highly sensitive DNA motif discovery in high-throughput sequencing data.DiNAMO：高通量测序数据中高度敏感的 DNA 基序发现。

BMC Bioinformatics. 2018 Jun 11;19(1):223. doi: 10.1186/s12859-018-2215-1.

An Efficient Algorithm for Discovering Motifs in Large DNA Data Sets.一种在大型DNA数据集中发现基序的高效算法。

IEEE Trans Nanobioscience. 2015 Jul;14(5):535-44. doi: 10.1109/TNB.2015.2421340. Epub 2015 Apr 9.

Argo_CUDA: Exhaustive GPU based approach for motif discovery in large DNA datasets.Argo_CUDA：基于GPU的详尽方法，用于在大型DNA数据集中发现基序。

J Bioinform Comput Biol. 2018 Feb;16(1):1740012. doi: 10.1142/S0219720017400121. Epub 2017 Dec 10.

A new algorithm for DNA motif discovery using multiple sample sequence sets.一种使用多个样本序列集进行DNA基序发现的新算法。

J Bioinform Comput Biol. 2019 Aug;17(4):1950021. doi: 10.1142/S0219720019500215.

A general approach for discriminative de novo motif discovery from high-throughput data.一种从高通量数据中进行判别式从头发现基序的通用方法。

Nucleic Acids Res. 2013 Nov;41(21):e197. doi: 10.1093/nar/gkt831. Epub 2013 Sep 20.

WSMD: weakly-supervised motif discovery in transcription factor ChIP-seq data.WSMD：在转录因子 ChIP-seq 数据中进行弱监督基序发现。

Sci Rep. 2017 Jun 12;7(1):3217. doi: 10.1038/s41598-017-03554-7.

BFC: correcting Illumina sequencing errors.BFC：校正Illumina测序错误。

Bioinformatics. 2015 Sep 1;31(17):2885-7. doi: 10.1093/bioinformatics/btv290. Epub 2015 May 6.

Inferring intra-motif dependencies of DNA binding sites from ChIP-seq data.从ChIP-seq数据推断DNA结合位点的基序内依赖性。

BMC Bioinformatics. 2015 Nov 9;16:375. doi: 10.1186/s12859-015-0797-4.

RSAT::Plants: Motif Discovery in ChIP-Seq Peaks of Plant Genomes.RSAT::植物：植物基因组ChIP-Seq峰中的基序发现

Methods Mol Biol. 2016;1482:297-322. doi: 10.1007/978-1-4939-6396-6_19.

An Efficient Exact Algorithm for Planted Motif Search on Large DNA Sequence Datasets.在大型 DNA 序列数据集上进行种植基序搜索的高效精确算法。

IEEE/ACM Trans Comput Biol Bioinform. 2024 Sep-Oct;21(5):1542-1551. doi: 10.1109/TCBB.2024.3404136. Epub 2024 Oct 9.

引用本文的文献

Proxi-RIMS-seq2 applied to native microbiomes uncovers hundreds of known and novel m5C methyltransferase specificities.应用于天然微生物群的Proxi-RIMS-seq2揭示了数百种已知和新型的m5C甲基转移酶特异性。

Nucleic Acids Res. 2025 Mar 20;53(6). doi: 10.1093/nar/gkaf226.

A Survey of Archaeal Restriction-Modification Systems.古菌限制修饰系统综述。

Microorganisms. 2023 Sep 28;11(10):2424. doi: 10.3390/microorganisms11102424.

本文引用的文献

pyAmpli: an amplicon-based variant filter pipeline for targeted resequencing data.pyAmpli：一种用于靶向重测序数据的基于扩增子的变异过滤流程。

BMC Bioinformatics. 2017 Dec 14;18(1):554. doi: 10.1186/s12859-017-1985-1.

Review of Clinical Next-Generation Sequencing.临床新一代测序综述

Arch Pathol Lab Med. 2017 Nov;141(11):1544-1557. doi: 10.5858/arpa.2016-0501-RA. Epub 2017 Aug 7.

JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles.JASPAR 2016：转录因子结合谱开放获取数据库的重大扩展与更新

Nucleic Acids Res. 2016 Jan 4;44(D1):D110-5. doi: 10.1093/nar/gkv1176. Epub 2015 Nov 3.

RSAT 2015: Regulatory Sequence Analysis Tools.RSAT 2015：调控序列分析工具

Nucleic Acids Res. 2015 Jul 1;43(W1):W50-6. doi: 10.1093/nar/gkv362. Epub 2015 Apr 22.

Binding site discovery from nucleic acid sequences by discriminative learning of hidden Markov models.通过隐马尔可夫模型的判别式学习从核酸序列中发现结合位点。

Nucleic Acids Res. 2014 Dec 1;42(21):12995-3011. doi: 10.1093/nar/gku1083. Epub 2014 Nov 11.

Estimating genotype error rates from high-coverage next-generation sequence data.从高覆盖度下一代测序数据估计基因型错误率。

Genome Res. 2014 Nov;24(11):1734-9. doi: 10.1101/gr.168393.113. Epub 2014 Oct 10.

Discovering motifs that induce sequencing errors.发现诱导测序错误的模体。

BMC Bioinformatics. 2013;14 Suppl 5(Suppl 5):S1. doi: 10.1186/1471-2105-14-S5-S1. Epub 2013 Apr 10.

Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples.检测不纯和异质癌症样本中的体细胞点突变。

Nat Biotechnol. 2013 Mar;31(3):213-9. doi: 10.1038/nbt.2514. Epub 2013 Feb 10.

Synthetic spike-in standards improve run-specific systematic error analysis for DNA and RNA sequencing.合成 Spike-in 标准可改善 DNA 和 RNA 测序中特定运行的系统误差分析。

PLoS One. 2012;7(7):e41356. doi: 10.1371/journal.pone.0041356. Epub 2012 Jul 31.

The life history of 21 breast cancers.21 例乳腺癌的生命史。

Cell. 2012 May 25;149(5):994-1007. doi: 10.1016/j.cell.2012.04.023. Epub 2012 May 17.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

DiNAMO：高通量测序数据中高度敏感的 DNA 基序发现。

DiNAMO: highly sensitive DNA motif discovery in high-throughput sequencing data.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

AVAILABILITY

背景

结果

结论

可用性

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献