基于染色质免疫沉淀测序（ChIP-seq）数据优化选择PWM基序数据库和序列扫描方法。

Optimally choosing PWM motif databases and sequence scanning approaches based on ChIP-seq data.

作者信息

Dabrowski Michal, Dojer Norbert, Krystkowiak Izabella, Kaminska Bozena, Wilczynski Bartek

机构信息

Laboratory of Bioinformatics, Nencki Institute of Experimental Biology, Pasteura 3, Warszawa, 02-093, Poland.

Institute of Informatics, Univeristy of Warsaw, Banacha 2, Warszawa, 02-097, Poland.

出版信息

BMC Bioinformatics. 2015 May 1;16:140. doi: 10.1186/s12859-015-0573-5.

DOI:10.1186/s12859-015-0573-5

PMID:25927199

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4436866/

Abstract

BACKGROUND

For many years now, binding preferences of Transcription Factors have been described by so called motifs, usually mathematically defined by position weight matrices or similar models, for the purpose of predicting potential binding sites. However, despite the availability of thousands of motif models in public and commercial databases, a researcher who wants to use them is left with many competing methods of identifying potential binding sites in a genome of interest and there is little published information regarding the optimality of different choices. Thanks to the availability of large number of different motif models as well as a number of experimental datasets describing actual binding of TFs in hundreds of TF-ChIP-seq pairs, we set out to perform a comprehensive analysis of this matter.

RESULTS

We focus on the task of identifying potential transcription factor binding sites in the human genome. Firstly, we provide a comprehensive comparison of the coverage and quality of models available in different databases, showing that the public databases have comparable TFs coverage and better motif performance than commercial databases. Secondly, we compare different motif scanners showing that, regardless of the database used, the tools developed by the scientific community outperform the commercial tools. Thirdly, we calculate for each motif a detection threshold optimizing the accuracy of prediction. Finally, we provide an in-depth comparison of different methods of choosing thresholds for all motifs a priori. Surprisingly, we show that selecting a common false-positive rate gives results that are the least biased by the information content of the motif and therefore most uniformly accurate.

CONCLUSION

We provide a guide for researchers working with transcription factor motifs. It is supplemented with detailed results of the analysis and the benchmark datasets at http://bioputer.mimuw.edu.pl/papers/motifs/ .

摘要

背景

多年来，转录因子的结合偏好一直通过所谓的基序来描述，通常由位置权重矩阵或类似模型进行数学定义，目的是预测潜在的结合位点。然而，尽管公共和商业数据库中有数千个基序模型可供使用，但想要使用这些模型的研究人员在识别感兴趣基因组中的潜在结合位点时，面临着许多相互竞争的方法，而且关于不同选择的最优性，几乎没有公开的信息。由于有大量不同的基序模型以及一些描述数百个转录因子-染色质免疫沉淀测序（TF-ChIP-seq）对中实际转录因子结合情况的实验数据集，我们着手对此事进行全面分析。

结果

我们专注于在人类基因组中识别潜在转录因子结合位点的任务。首先，我们对不同数据库中可用模型的覆盖范围和质量进行了全面比较，结果表明公共数据库的转录因子覆盖范围相当，且基序性能比商业数据库更好。其次，我们比较了不同的基序扫描器，结果表明，无论使用哪个数据库，科学界开发的工具都优于商业工具。第三，我们为每个基序计算一个检测阈值，以优化预测的准确性。最后，我们对所有基序先验选择阈值的不同方法进行了深入比较。令人惊讶的是，我们发现选择一个共同的假阳性率所得到的结果受基序信息含量的偏差最小，因此最为统一准确。

结论

我们为研究转录因子基序的研究人员提供了一份指南。该指南在http://bioputer.mimuw.edu.pl/papers/motifs/ 上补充了详细的分析结果和基准数据集。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/03aa/4436866/08dc5b2f6130/12859_2015_573_Fig1_HTML.jpg

相似文献

Optimally choosing PWM motif databases and sequence scanning approaches based on ChIP-seq data.基于染色质免疫沉淀测序（ChIP-seq）数据优化选择PWM基序数据库和序列扫描方法。

BMC Bioinformatics. 2015 May 1;16:140. doi: 10.1186/s12859-015-0573-5.

Simultaneously learning DNA motif along with its position and sequence rank preferences through expectation maximization algorithm.通过期望最大化算法同时学习DNA基序及其位置和序列排名偏好。

J Comput Biol. 2013 Mar;20(3):237-48. doi: 10.1089/cmb.2012.0233.

Inferring intra-motif dependencies of DNA binding sites from ChIP-seq data.从ChIP-seq数据推断DNA结合位点的基序内依赖性。

BMC Bioinformatics. 2015 Nov 9;16:375. doi: 10.1186/s12859-015-0797-4.

Differential motif enrichment analysis of paired ChIP-seq experiments.配对染色质免疫沉淀测序（ChIP-seq）实验的差异基序富集分析

BMC Genomics. 2014 Sep 2;15(1):752. doi: 10.1186/1471-2164-15-752.

Tree-based position weight matrix approach to model transcription factor binding site profiles.基于树的位置权重矩阵方法来模拟转录因子结合位点图谱。

PLoS One. 2011;6(9):e24210. doi: 10.1371/journal.pone.0024210. Epub 2011 Sep 2.

A Fast Cluster Motif Finding Algorithm for ChIP-Seq Data Sets.一种用于ChIP-Seq数据集的快速聚类基序发现算法。

Biomed Res Int. 2015;2015:218068. doi: 10.1155/2015/218068. Epub 2015 Jul 5.

Identification of C2H2-ZF binding preferences from ChIP-seq data using RCADE.使用RCADE从ChIP-seq数据中鉴定C2H2锌指蛋白的结合偏好。

Bioinformatics. 2015 Sep 1;31(17):2879-81. doi: 10.1093/bioinformatics/btv284. Epub 2015 May 6.

Improved linking of motifs to their TFs using domain information.利用域信息改进基序与其 TF 的关联。

Bioinformatics. 2020 Mar 1;36(6):1655-1662. doi: 10.1093/bioinformatics/btz855.

abc4pwm: affinity based clustering for position weight matrices in applications of DNA sequence analysis.abc4pwm：基于亲和度的位置权重矩阵聚类在 DNA 序列分析中的应用。

BMC Bioinformatics. 2022 Mar 3;23(1):83. doi: 10.1186/s12859-022-04615-z.

Using combined evidence from replicates to evaluate ChIP-seq peaks.使用来自重复样本的综合证据评估染色质免疫沉淀测序（ChIP-seq）峰。

Bioinformatics. 2015 Sep 1;31(17):2761-9. doi: 10.1093/bioinformatics/btv293. Epub 2015 May 7.

引用本文的文献

Bioinformatic Prediction and High Throughput In Vivo Screening to Identify Cis-Regulatory Elements for the Development of Algal Synthetic Promoters.生物信息学预测和高通量体内筛选鉴定藻类合成启动子发育的顺式调控元件。

ACS Synth Biol. 2024 Jul 19;13(7):2150-2165. doi: 10.1021/acssynbio.4c00199. Epub 2024 Jul 10.

ePRINT: exonuclease assisted mapping of protein-RNA interactions.ePRINT：核酸外切酶辅助的蛋白质-RNA 相互作用作图。

Genome Biol. 2024 May 28;25(1):140. doi: 10.1186/s13059-024-03271-1.

Target Finder of Transcription Factor (TFoTF): a novel tool to predict transcription factor-targeted genes in cancer.转录因子靶基因预测工具（TFoTF）：一种用于预测癌症中转录因子靶基因的新型工具。

Mol Oncol. 2023 Jul;17(7):1246-1262. doi: 10.1002/1878-0261.13388. Epub 2023 Feb 11.

Regulation of host and viral promoters during human cytomegalovirus latency via US28 and CTCF.通过 US28 和 CTCF 调控人巨细胞病毒潜伏时的宿主和病毒启动子。

J Gen Virol. 2021 May;102(5). doi: 10.1099/jgv.0.001609.

Parallel Accelerated Evolution in Distant Hibernators Reveals Candidate Cis Elements and Genetic Circuits Regulating Mammalian Obesity.远源冬眠动物中的平行加速进化揭示了调节哺乳动物肥胖的候选顺式元件和遗传回路。

Cell Rep. 2019 Nov 26;29(9):2608-2620.e4. doi: 10.1016/j.celrep.2019.10.102.

A Multireporter Bacterial 2-Hybrid Assay for the High-Throughput and Dynamic Assay of PDZ Domain-Peptide Interactions.一种用于PDZ结构域-肽相互作用高通量动态检测的多报告基因细菌双杂交检测法。

ACS Synth Biol. 2019 May 17;8(5):918-928. doi: 10.1021/acssynbio.8b00499. Epub 2019 Apr 18.

Integrated analysis of motif activity and gene expression changes of transcription factors.转录因子基序活性和基因表达变化的综合分析。

Genome Res. 2018 Feb;28(2):243-255. doi: 10.1101/gr.227231.117. Epub 2017 Dec 12.

Taking promoters out of enhancers in sequence based predictions of tissue-specific mammalian enhancers.在基于序列的组织特异性哺乳动物增强子预测中，将启动子从增强子中去除。

BMC Med Genomics. 2017 May 24;10(Suppl 1):34. doi: 10.1186/s12920-017-0264-3.

Negative selection maintains transcription factor binding motifs in human cancer.负选择维持人类癌症中的转录因子结合基序。

BMC Genomics. 2016 Jun 23;17 Suppl 2(Suppl 2):395. doi: 10.1186/s12864-016-2728-9.

HOCOMOCO: expansion and enhancement of the collection of transcription factor binding sites models.HOCOMOCO：转录因子结合位点模型集合的扩展与增强

Nucleic Acids Res. 2016 Jan 4;44(D1):D116-25. doi: 10.1093/nar/gkv1249. Epub 2015 Nov 19.

本文引用的文献

Improving analysis of transcription factor binding sites within ChIP-Seq data based on topological motif enrichment.基于拓扑基序富集改进ChIP-Seq数据中转录因子结合位点的分析。

BMC Genomics. 2014 Jun 13;15(1):472. doi: 10.1186/1471-2164-15-472.

TFBSshape: a motif database for DNA shape features of transcription factor binding sites.TFBSshape：一个转录因子结合位点 DNA 形状特征的基序数据库。

Nucleic Acids Res. 2014 Jan;42(Database issue):D148-55. doi: 10.1093/nar/gkt1087. Epub 2013 Nov 7.

JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles.JASPAR 2014：一个广泛扩展和更新的转录因子结合谱公开访问数据库。

Nucleic Acids Res. 2014 Jan;42(Database issue):D142-7. doi: 10.1093/nar/gkt997. Epub 2013 Nov 4.

Nencki Genomics Database--Ensembl funcgen enhanced with intersections, user data and genome-wide TFBS motifs.Nencki 基因组学数据库--Ensembl funcgen 通过交集、用户数据和全基因组 TFBS 基序增强

Database (Oxford). 2013 Oct 1;2013:bat069. doi: 10.1093/database/bat069. Print 2013.

DNA-binding specificities of human transcription factors.人类转录因子的 DNA 结合特异性。

Cell. 2013 Jan 17;152(1-2):327-39. doi: 10.1016/j.cell.2012.12.009.

Ensembl 2013.Ensembl 2013.

Nucleic Acids Res. 2013 Jan;41(Database issue):D48-55. doi: 10.1093/nar/gks1236. Epub 2012 Nov 30.

SwissRegulon, a database of genome-wide annotations of regulatory sites: recent updates.SwissRegulon，一个全基因组调控位点注释数据库：最新更新。

Nucleic Acids Res. 2013 Jan;41(Database issue):D214-20. doi: 10.1093/nar/gks1145. Epub 2012 Nov 24.

HOCOMOCO: a comprehensive collection of human transcription factor binding sites models.HOCOMOCO：一个全面的人类转录因子结合位点模型集合。

Nucleic Acids Res. 2013 Jan;41(Database issue):D195-202. doi: 10.1093/nar/gks1089. Epub 2012 Nov 21.

An integrated encyclopedia of DNA elements in the human genome.人类基因组中 DNA 元件的综合百科全书。

Nature. 2012 Sep 6;489(7414):57-74. doi: 10.1038/nature11247.

Improved models for transcription factor binding site identification using nonindependent interactions.利用非独立相互作用改进转录因子结合位点识别模型。

Genetics. 2012 Jul;191(3):781-90. doi: 10.1534/genetics.112.138685. Epub 2012 Apr 13.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于染色质免疫沉淀测序（ChIP-seq）数据优化选择PWM基序数据库和序列扫描方法。

Optimally choosing PWM motif databases and sequence scanning approaches based on ChIP-seq data.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献