• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

abc4pwm:基于亲和度的位置权重矩阵聚类在 DNA 序列分析中的应用。

abc4pwm: affinity based clustering for position weight matrices in applications of DNA sequence analysis.

机构信息

Department of Pathology, Oslo University Hospital - Norwegian Radium Hospital, Oslo, Norway.

Department of Medical Biochemistry, Oslo University Hospital and University of Oslo, Oslo, Norway.

出版信息

BMC Bioinformatics. 2022 Mar 3;23(1):83. doi: 10.1186/s12859-022-04615-z.

DOI:10.1186/s12859-022-04615-z
PMID:35240993
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8896320/
Abstract

BACKGROUND

Transcription factor (TF) binding motifs are identified by high throughput sequencing technologies as means to capture Protein-DNA interactions. These motifs are often represented by consensus sequences in form of position weight matrices (PWMs). With ever-increasing pool of TF binding motifs from multiple sources, redundancy issues are difficult to avoid, especially when every source maintains its own database for collection. One solution can be to cluster biologically relevant or similar PWMs, whether coming from experimental detection or in silico predictions. However, there is a lack of efficient tools to cluster PWMs. Assessing quality of PWM clusters is yet another challenge. Therefore, new methods and tools are required to efficiently cluster PWMs and assess quality of clusters.

RESULTS

A new Python package Affinity Based Clustering for Position Weight Matrices (abc4pwm) was developed. It efficiently clustered PWMs from multiple sources with or without using DNA-Binding Domain (DBD) information, generated a representative motif for each cluster, evaluated the clustering quality automatically, and filtered out incorrectly clustered PWMs. Additionally, it was able to update human DBD family database automatically, classified known human TF PWMs to the respective DBD family, and performed TF motif searching and motif discovery by a new ensemble learning approach.

CONCLUSION

This work demonstrates applications of abc4pwm in the DNA sequence analysis for various high throughput sequencing data using ~ 1770 human TF PWMs. It recovered known TF motifs at gene promoters based on gene expression profiles (RNA-seq) and identified true TF binding targets for motifs predicted from ChIP-seq experiments. Abc4pwm is a useful tool for TF motif searching, clustering, quality assessment and integration in multiple types of sequence data analysis including RNA-seq, ChIP-seq and ATAC-seq.

摘要

背景

转录因子(TF)结合基序是通过高通量测序技术来捕获蛋白质-DNA 相互作用而被鉴定的。这些基序通常以位置权重矩阵(PWMs)的共识序列形式表示。随着来自多个来源的 TF 结合基序数量的不断增加,冗余问题难以避免,尤其是当每个来源都维护自己的数据库进行收集时。一种解决方案是对具有生物学相关性或相似性的 PWMs 进行聚类,无论是来自实验检测还是计算预测。然而,目前缺乏有效的工具来对 PWMs 进行聚类。评估 PWM 聚类的质量也是另一个挑战。因此,需要新的方法和工具来有效地对 PWMs 进行聚类,并评估聚类的质量。

结果

开发了一个新的 Python 包 Affinity Based Clustering for Position Weight Matrices (abc4pwm)。它可以有效地对来自多个来源的 PWMs 进行聚类,无论是否使用 DNA 结合结构域(DBD)信息,为每个聚类生成一个代表性基序,自动评估聚类质量,并过滤掉聚类错误的 PWMs。此外,它还能够自动更新人类 DBD 家族数据库,将已知的人类 TF PWM 分类到相应的 DBD 家族,并通过新的集成学习方法进行 TF 基序搜索和基序发现。

结论

这项工作展示了 abc4pwm 在使用约 1770 个人类 TF PWM 的各种高通量测序数据的 DNA 序列分析中的应用。它根据基因表达谱(RNA-seq)从基因启动子中恢复了已知的 TF 基序,并从 ChIP-seq 实验预测的基序中鉴定了真正的 TF 结合靶标。abc4pwm 是 TF 基序搜索、聚类、质量评估和整合到多种类型的序列数据分析(包括 RNA-seq、ChIP-seq 和 ATAC-seq)中的有用工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b91/8896320/0e0823b30483/12859_2022_4615_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b91/8896320/69d8f4d77964/12859_2022_4615_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b91/8896320/4f11ef662485/12859_2022_4615_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b91/8896320/23c6f83b3e42/12859_2022_4615_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b91/8896320/ddd54bf95dab/12859_2022_4615_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b91/8896320/e05f47ac521a/12859_2022_4615_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b91/8896320/0136cefcd343/12859_2022_4615_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b91/8896320/e56d4f9a83de/12859_2022_4615_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b91/8896320/0e0823b30483/12859_2022_4615_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b91/8896320/69d8f4d77964/12859_2022_4615_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b91/8896320/4f11ef662485/12859_2022_4615_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b91/8896320/23c6f83b3e42/12859_2022_4615_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b91/8896320/ddd54bf95dab/12859_2022_4615_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b91/8896320/e05f47ac521a/12859_2022_4615_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b91/8896320/0136cefcd343/12859_2022_4615_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b91/8896320/e56d4f9a83de/12859_2022_4615_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b91/8896320/0e0823b30483/12859_2022_4615_Fig8_HTML.jpg

相似文献

1
abc4pwm: affinity based clustering for position weight matrices in applications of DNA sequence analysis.abc4pwm:基于亲和度的位置权重矩阵聚类在 DNA 序列分析中的应用。
BMC Bioinformatics. 2022 Mar 3;23(1):83. doi: 10.1186/s12859-022-04615-z.
2
Increasing coverage of transcription factor position weight matrices through domain-level homology.通过域级同源性提高转录因子位置权重矩阵的覆盖率。
PLoS One. 2012;7(8):e42779. doi: 10.1371/journal.pone.0042779. Epub 2012 Aug 27.
3
Tree-based position weight matrix approach to model transcription factor binding site profiles.基于树的位置权重矩阵方法来模拟转录因子结合位点图谱。
PLoS One. 2011;6(9):e24210. doi: 10.1371/journal.pone.0024210. Epub 2011 Sep 2.
4
Improved linking of motifs to their TFs using domain information.利用域信息改进基序与其 TF 的关联。
Bioinformatics. 2020 Mar 1;36(6):1655-1662. doi: 10.1093/bioinformatics/btz855.
5
RSAT matrix-clustering: dynamic exploration and redundancy reduction of transcription factor binding motif collections.RSAT矩阵聚类:转录因子结合基序集合的动态探索与冗余减少
Nucleic Acids Res. 2017 Jul 27;45(13):e119. doi: 10.1093/nar/gkx314.
6
P-value-based regulatory motif discovery using positional weight matrices.基于 P 值的调控基序发现方法,使用位置权重矩阵。
Genome Res. 2013 Jan;23(1):181-94. doi: 10.1101/gr.139881.112. Epub 2012 Sep 18.
7
The next generation of transcription factor binding site prediction.下一代转录因子结合位点预测。
PLoS Comput Biol. 2013;9(9):e1003214. doi: 10.1371/journal.pcbi.1003214. Epub 2013 Sep 5.
8
Learning position weight matrices from sequence and expression data.从序列和表达数据中学习位置权重矩阵。
Comput Syst Bioinformatics Conf. 2007;6:249-60.
9
DNA Motif Databases and Their Uses.DNA 基序数据库及其用途。
Curr Protoc Bioinformatics. 2015 Sep 3;51:2.15.1-2.15.6. doi: 10.1002/0471250953.bi0215s51.
10
Identification of co-occurring transcription factor binding sites from DNA sequence using clustered position weight matrices.利用聚类位置权重矩阵从 DNA 序列中识别共同出现的转录因子结合位点。
Nucleic Acids Res. 2012 Mar;40(5):e38. doi: 10.1093/nar/gkr1252. Epub 2011 Dec 19.

引用本文的文献

1
The application of machine learning in clinical microbiology and infectious diseases.机器学习在临床微生物学和传染病中的应用。
Front Cell Infect Microbiol. 2025 May 1;15:1545646. doi: 10.3389/fcimb.2025.1545646. eCollection 2025.
2
The evaluation of transcription factor binding site prediction tools in human and Arabidopsis genomes.人类和拟南芥基因组中转录因子结合位点预测工具的评估
BMC Bioinformatics. 2024 Dec 2;25(1):371. doi: 10.1186/s12859-024-05995-0.
3
Identifying functional regulatory mutation blocks by integrating genome sequencing and transcriptome data.

本文引用的文献

1
Improved linking of motifs to their TFs using domain information.利用域信息改进基序与其 TF 的关联。
Bioinformatics. 2020 Mar 1;36(6):1655-1662. doi: 10.1093/bioinformatics/btz855.
2
JASPAR 2020: update of the open-access database of transcription factor binding profiles.JASPAR 2020:转录因子结合谱开放获取数据库的更新。
Nucleic Acids Res. 2020 Jan 8;48(D1):D87-D92. doi: 10.1093/nar/gkz1001.
3
Temporal dynamic reorganization of 3D chromatin architecture in hormone-induced breast cancer and endocrine resistance.激素诱导的乳腺癌及内分泌抵抗中三维染色质构象的时空调控重排
通过整合基因组测序和转录组数据来识别功能性调控突变块。
iScience. 2023 Jul 3;26(8):107266. doi: 10.1016/j.isci.2023.107266. eCollection 2023 Aug 18.
Nat Commun. 2019 Apr 3;10(1):1522. doi: 10.1038/s41467-019-09320-9.
4
MEGA X: Molecular Evolutionary Genetics Analysis across Computing Platforms.MEGA X:跨越计算平台的分子进化遗传学分析。
Mol Biol Evol. 2018 Jun 1;35(6):1547-1549. doi: 10.1093/molbev/msy096.
5
RSAT 2018: regulatory sequence analysis tools 20th anniversary.RSAT 2018:调控序列分析工具 20 周年纪念。
Nucleic Acids Res. 2018 Jul 2;46(W1):W209-W214. doi: 10.1093/nar/gky317.
6
The Human Transcription Factors.人类转录因子。
Cell. 2018 Feb 8;172(4):650-665. doi: 10.1016/j.cell.2018.01.029.
7
TFClass: expanding the classification of human transcription factors to their mammalian orthologs.TFClass:扩展人类转录因子的分类,涵盖其哺乳动物同源物。
Nucleic Acids Res. 2018 Jan 4;46(D1):D343-D347. doi: 10.1093/nar/gkx987.
8
Predicting Variation of DNA Shape Preferences in Protein-DNA Interaction in Cancer Cells with a New Biophysical Model.利用新的生物物理模型预测癌细胞中蛋白质 - DNA 相互作用中 DNA 形状偏好的变化。
Genes (Basel). 2017 Sep 18;8(9):233. doi: 10.3390/genes8090233.
9
Identification of a core TP53 transcriptional program with highly distributed tumor suppressive activity.鉴定具有高度分布式肿瘤抑制活性的核心 TP53 转录程序。
Genome Res. 2017 Oct;27(10):1645-1657. doi: 10.1101/gr.220533.117. Epub 2017 Sep 13.
10
Integrative whole-genome sequence analysis reveals roles of regulatory mutations in BCL6 and BCL2 in follicular lymphoma.整合全基因组序列分析揭示了调节突变在滤泡性淋巴瘤中 BCL6 和 BCL2 中的作用。
Sci Rep. 2017 Aug 1;7(1):7040. doi: 10.1038/s41598-017-07226-4.