WSMD：在转录因子 ChIP-seq 数据中进行弱监督基序发现。

WSMD: weakly-supervised motif discovery in transcription factor ChIP-seq data.

机构信息

Institute of Machine Learning and Systems Biology, College of Electronics and Information Engineering, Tongji University, Shanghai, 201804, P.R. China.

出版信息

Sci Rep. 2017 Jun 12;7(1):3217. doi: 10.1038/s41598-017-03554-7.

DOI:10.1038/s41598-017-03554-7

PMID:28607381

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5468353/

Abstract

Although discriminative motif discovery (DMD) methods are promising for eliciting motifs from high-throughput experimental data, due to consideration of computational expense, most of existing DMD methods have to choose approximate schemes that greatly restrict the search space, leading to significant loss of predictive accuracy. In this paper, we propose Weakly-Supervised Motif Discovery (WSMD) to discover motifs from ChIP-seq datasets. In contrast to the learning strategies adopted by previous DMD methods, WSMD allows a "global" optimization scheme of the motif parameters in continuous space, thereby reducing the information loss of model representation and improving the quality of resultant motifs. Meanwhile, by exploiting the connection between DMD framework and existing weakly supervised learning (WSL) technologies, we also present highly scalable learning strategies for the proposed method. The experimental results on both real ChIP-seq datasets and synthetic datasets show that WSMD substantially outperforms former DMD methods (including DREME, HOMER, XXmotif, motifRG and DECOD) in terms of predictive accuracy, while also achieving a competitive computational speed.

摘要

尽管判别基序发现（DMD）方法在从高通量实验数据中提取基序方面很有前景，但由于考虑到计算费用，大多数现有的 DMD 方法不得不选择近似方案，这极大地限制了搜索空间，导致预测准确性的显著损失。在本文中，我们提出了弱监督基序发现（WSMD）来从 ChIP-seq 数据集中发现基序。与之前 DMD 方法采用的学习策略不同，WSMD 允许在连续空间中对基序参数进行“全局”优化方案，从而减少模型表示的信息损失，并提高所得基序的质量。同时，通过利用 DMD 框架和现有的弱监督学习（WSL）技术之间的联系，我们还为所提出的方法提供了高度可扩展的学习策略。在真实的 ChIP-seq 数据集和合成数据集上的实验结果表明，WSMD 在预测准确性方面明显优于以前的 DMD 方法（包括 DREME、HOMER、XXmotif、motifRG 和 DECOD），同时也实现了具有竞争力的计算速度。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc3c/5468353/b36eb528f02a/41598_2017_3554_Fig1_HTML.jpg

相似文献

WSMD: weakly-supervised motif discovery in transcription factor ChIP-seq data.WSMD：在转录因子 ChIP-seq 数据中进行弱监督基序发现。

Sci Rep. 2017 Jun 12;7(1):3217. doi: 10.1038/s41598-017-03554-7.

Predicting TF-DNA Binding Motifs from ChIP-seq Datasets Using the Bag-Based Classifier Combined With a Multi-Fold Learning Scheme.基于 Bag 分类器和多折学习方案的 ChIP-seq 数据集预测 TF-DNA 结合基序。

IEEE/ACM Trans Comput Biol Bioinform. 2021 Sep-Oct;18(5):1743-1751. doi: 10.1109/TCBB.2020.3025007. Epub 2021 Oct 7.

Inferring intra-motif dependencies of DNA binding sites from ChIP-seq data.从ChIP-seq数据推断DNA结合位点的基序内依赖性。

BMC Bioinformatics. 2015 Nov 9;16:375. doi: 10.1186/s12859-015-0797-4.

RSAT::Plants: Motif Discovery in ChIP-Seq Peaks of Plant Genomes.RSAT::植物：植物基因组ChIP-Seq峰中的基序发现

Methods Mol Biol. 2016;1482:297-322. doi: 10.1007/978-1-4939-6396-6_19.

DiscMLA: An Efficient Discriminative Motif Learning Algorithm over High-Throughput Datasets.DiscMLA：一种高效的高通量数据集判别基序学习算法。

IEEE/ACM Trans Comput Biol Bioinform. 2018 Nov-Dec;15(6):1810-1820. doi: 10.1109/TCBB.2016.2561930. Epub 2016 May 3.

A Fast Cluster Motif Finding Algorithm for ChIP-Seq Data Sets.一种用于ChIP-Seq数据集的快速聚类基序发现算法。

Biomed Res Int. 2015;2015:218068. doi: 10.1155/2015/218068. Epub 2015 Jul 5.

Collaborative Completion of Transcription Factor Binding Profiles via Local Sensitive Unified Embedding.通过局部敏感统一嵌入实现转录因子结合谱的协同完成

IEEE Trans Nanobioscience. 2016 Dec;15(8):946-958. doi: 10.1109/TNB.2016.2625823. Epub 2016 Nov 7.

Argo_CUDA: Exhaustive GPU based approach for motif discovery in large DNA datasets.Argo_CUDA：基于GPU的详尽方法，用于在大型DNA数据集中发现基序。

J Bioinform Comput Biol. 2018 Feb;16(1):1740012. doi: 10.1142/S0219720017400121. Epub 2017 Dec 10.

Using combined evidence from replicates to evaluate ChIP-seq peaks.使用来自重复样本的综合证据评估染色质免疫沉淀测序（ChIP-seq）峰。

Bioinformatics. 2015 Sep 1;31(17):2761-9. doi: 10.1093/bioinformatics/btv293. Epub 2015 May 7.

Identification of Predictive Cis-Regulatory Elements Using a Discriminative Objective Function and a Dynamic Search Space.使用判别目标函数和动态搜索空间识别预测性顺式调控元件。

PLoS One. 2015 Oct 14;10(10):e0140557. doi: 10.1371/journal.pone.0140557. eCollection 2015.

引用本文的文献

A CNN based m5c RNA methylation predictor.基于 CNN 的 m5c RNA 甲基化预测器。

Sci Rep. 2023 Dec 11;13(1):21885. doi: 10.1038/s41598-023-48751-9.

CMash: fast, multi-resolution estimation of k-mer-based Jaccard and containment indices.CMash：基于 k-mer 的 Jaccard 和包含指数的快速、多分辨率估计。

Bioinformatics. 2022 Jun 24;38(Suppl 1):i28-i35. doi: 10.1093/bioinformatics/btac237.

TF-Marker: a comprehensive manually curated database for transcription factors and related markers in specific cell and tissue types in human.TF-Marker：一个全面的、经过人工精心整理的数据库，包含人类特定细胞和组织类型中转录因子及其相关标志物的信息。

Nucleic Acids Res. 2022 Jan 7;50(D1):D402-D412. doi: 10.1093/nar/gkab1114.

Cancer CRC: A Comprehensive Cancer Core Transcriptional Regulatory Circuit Resource and Analysis Platform.癌症CRC：一个综合性癌症核心转录调控回路资源与分析平台。

Front Oncol. 2021 Oct 12;11:761700. doi: 10.3389/fonc.2021.761700. eCollection 2021.

A classification model for lncRNA and mRNA based on k-mers and a convolutional neural network.基于 k- -mer 和卷积神经网络的 lncRNA 和 mRNA 分类模型。

BMC Bioinformatics. 2019 Sep 13;20(1):469. doi: 10.1186/s12859-019-3039-3.

ProSampler: an ultrafast and accurate motif finder in large ChIP-seq datasets for combinatory motif discovery.ProSampler：一种在大型 ChIP-seq 数据集中用于组合基序发现的超快速和准确的基序查找器。

Bioinformatics. 2019 Nov 1;35(22):4632-4639. doi: 10.1093/bioinformatics/btz290.

FisherMP: fully parallel algorithm for detecting combinatorial motifs from large ChIP-seq datasets.FisherMP：一种用于从大型 ChIP-seq 数据集中检测组合基序的完全并行算法。

DNA Res. 2019 Jun 1;26(3):231-242. doi: 10.1093/dnares/dsz004.

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances.选择具有相似有效二元分类性能的多个生物标志物子集。

J Vis Exp. 2018 Oct 11(140):57738. doi: 10.3791/57738.

Systems and Synthetic Biology Approaches to Engineer Fungi for Fine Chemical Production.用于工程化改造真菌以生产精细化学品的系统生物学和合成生物学方法。

Front Bioeng Biotechnol. 2018 Oct 3;6:117. doi: 10.3389/fbioe.2018.00117. eCollection 2018.

The folded k-spectrum kernel: A machine learning approach to detecting transcription factor binding sites with gapped nucleotide dependencies.折叠k谱核：一种利用有间隙核苷酸依赖性检测转录因子结合位点的机器学习方法。

PLoS One. 2017 Oct 5;12(10):e0185570. doi: 10.1371/journal.pone.0185570. eCollection 2017.

本文引用的文献

Predicting the impact of combined therapies on myeloma cell growth using a hybrid multi-scale agent-based model.使用基于混合多尺度智能体的模型预测联合疗法对骨髓瘤细胞生长的影响。

Oncotarget. 2017 Jan 31;8(5):7647-7665. doi: 10.18632/oncotarget.13831.

The Genetics of Transcription Factor DNA Binding Variation.转录因子DNA结合变异的遗传学

Cell. 2016 Jul 28;166(3):538-554. doi: 10.1016/j.cell.2016.07.012.

Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks.巴塞特：利用深度卷积神经网络学习可及基因组的调控密码。

Genome Res. 2016 Jul;26(7):990-9. doi: 10.1101/gr.200535.115. Epub 2016 May 3.

gkmSVM: an R package for gapped-kmer SVM.gkmSVM：一个用于带间隔k-mer支持向量机的R软件包。

Bioinformatics. 2016 Jul 15;32(14):2205-7. doi: 10.1093/bioinformatics/btw203. Epub 2016 Apr 19.

Weakly Supervised Object Localization with Multi-Fold Multiple Instance Learning.基于多折多实例学习的弱监督目标定位。

IEEE Trans Pattern Anal Mach Intell. 2017 Jan;39(1):189-203. doi: 10.1109/TPAMI.2016.2535231. Epub 2016 Feb 26.

Prediction of treatment efficacy for prostate cancer using a mathematical model.使用数学模型预测前列腺癌的治疗效果。

Sci Rep. 2016 Feb 12;6:21599. doi: 10.1038/srep21599.

Weakly Supervised Large Scale Object Localization with Multiple Instance Learning and Bag Splitting.基于多示例学习和 Bag Splitting 的弱监督大规模目标定位。

IEEE Trans Pattern Anal Mach Intell. 2016 Feb;38(2):405-16. doi: 10.1109/TPAMI.2015.2456908.

DNA-dependent formation of transcription factor pairs alters their binding specificity.DNA 依赖性转录因子对的形成改变了它们的结合特异性。

Nature. 2015 Nov 19;527(7578):384-8. doi: 10.1038/nature15518. Epub 2015 Nov 9.

Large-scale identification of sequence variants influencing human transcription factor occupancy in vivo.体内影响人类转录因子占据情况的序列变异的大规模鉴定。

Nat Genet. 2015 Dec;47(12):1393-401. doi: 10.1038/ng.3432. Epub 2015 Oct 26.

GERV: a statistical method for generative evaluation of regulatory variants for transcription factor binding.GERV：一种用于转录因子结合调控变异生成性评估的统计方法。

Bioinformatics. 2016 Feb 15;32(4):490-6. doi: 10.1093/bioinformatics/btv565. Epub 2015 Oct 17.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

WSMD：在转录因子 ChIP-seq 数据中进行弱监督基序发现。

WSMD: weakly-supervised motif discovery in transcription factor ChIP-seq data.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献