• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于泛基因组图的变体和单倍型感知 motif 扫描

GRAFIMO: Variant and haplotype aware motif scanning on pangenome graphs.

机构信息

Computer Science Department, University of Verona, Verona, Italy.

University of Tennessee Health Science Center, Memphis, Tennessee, United States of America.

出版信息

PLoS Comput Biol. 2021 Sep 27;17(9):e1009444. doi: 10.1371/journal.pcbi.1009444. eCollection 2021 Sep.

DOI:10.1371/journal.pcbi.1009444
PMID:34570769
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8519448/
Abstract

Transcription factors (TFs) are proteins that promote or reduce the expression of genes by binding short genomic DNA sequences known as transcription factor binding sites (TFBS). While several tools have been developed to scan for potential occurrences of TFBS in linear DNA sequences or reference genomes, no tool exists to find them in pangenome variation graphs (VGs). VGs are sequence-labelled graphs that can efficiently encode collections of genomes and their variants in a single, compact data structure. Because VGs can losslessly compress large pangenomes, TFBS scanning in VGs can efficiently capture how genomic variation affects the potential binding landscape of TFs in a population of individuals. Here we present GRAFIMO (GRAph-based Finding of Individual Motif Occurrences), a command-line tool for the scanning of known TF DNA motifs represented as Position Weight Matrices (PWMs) in VGs. GRAFIMO extends the standard PWM scanning procedure by considering variations and alternative haplotypes encoded in a VG. Using GRAFIMO on a VG based on individuals from the 1000 Genomes project we recover several potential binding sites that are enhanced, weakened or missed when scanning only the reference genome, and which could constitute individual-specific binding events. GRAFIMO is available as an open-source tool, under the MIT license, at https://github.com/pinellolab/GRAFIMO and https://github.com/InfOmics/GRAFIMO.

摘要

转录因子(TFs)是通过结合短的基因组 DNA 序列(称为转录因子结合位点(TFBS))来促进或降低基因表达的蛋白质。虽然已经开发了几种工具来扫描线性 DNA 序列或参考基因组中潜在的 TFBS 出现情况,但没有工具可用于在泛基因组变异图(VG)中找到它们。VG 是序列标记的图,可以在单个紧凑的数据结构中有效地对基因组及其变体的集合进行编码。由于 VG 可以无损地压缩大型泛基因组,因此在 VG 中进行 TFBS 扫描可以有效地捕获基因组变异如何影响个体群体中 TF 的潜在结合景观。在这里,我们介绍了 GRAFIMO(基于图的个体基序出现发现),这是一种命令行工具,用于在 VG 中扫描表示为位置权重矩阵(PWM)的已知 TF DNA 基序。GRAFIMO 通过考虑在 VG 中编码的变体和替代单倍型来扩展标准 PWM 扫描过程。在基于 1000 个基因组项目个体的 VG 上使用 GRAFIMO,我们恢复了几个潜在的结合位点,当仅扫描参考基因组时,这些结合位点会被增强、减弱或错过,并且可能构成个体特异性结合事件。GRAFIMO 是一个开源工具,根据麻省理工学院的许可证提供,可在 https://github.com/pinellolab/GRAFIMO 和 https://github.com/InfOmics/GRAFIMO 上获得。

相似文献

1
GRAFIMO: Variant and haplotype aware motif scanning on pangenome graphs.基于泛基因组图的变体和单倍型感知 motif 扫描
PLoS Comput Biol. 2021 Sep 27;17(9):e1009444. doi: 10.1371/journal.pcbi.1009444. eCollection 2021 Sep.
2
A DNA shape-based regulatory score improves position-weight matrix-based recognition of transcription factor binding sites.一种基于DNA形状的调控评分提高了基于位置权重矩阵对转录因子结合位点的识别。
Bioinformatics. 2015 Nov 1;31(21):3445-50. doi: 10.1093/bioinformatics/btv391. Epub 2015 Jun 30.
3
COPS: detecting co-occurrence and spatial arrangement of transcription factor binding motifs in genome-wide datasets.COPS:在全基因组数据集中检测转录因子结合基序的共现和空间排列。
PLoS One. 2012;7(12):e52055. doi: 10.1371/journal.pone.0052055. Epub 2012 Dec 18.
4
abc4pwm: affinity based clustering for position weight matrices in applications of DNA sequence analysis.abc4pwm:基于亲和度的位置权重矩阵聚类在 DNA 序列分析中的应用。
BMC Bioinformatics. 2022 Mar 3;23(1):83. doi: 10.1186/s12859-022-04615-z.
5
Transcription Factor Information System (TFIS): A Tool for Detection of Transcription Factor Binding Sites.转录因子信息系统(TFIS):一种用于检测转录因子结合位点的工具。
Interdiscip Sci. 2017 Sep;9(3):378-391. doi: 10.1007/s12539-016-0168-5. Epub 2016 Apr 6.
6
Pangenome Graphs.泛基因组图谱。
Annu Rev Genomics Hum Genet. 2020 Aug 31;21:139-162. doi: 10.1146/annurev-genom-120219-080406. Epub 2020 May 26.
7
Pangenome graph layout by Path-Guided Stochastic Gradient Descent.基于路径引导随机梯度下降的泛基因组图谱布局。
Bioinformatics. 2024 Jul 1;40(7). doi: 10.1093/bioinformatics/btae363.
8
A general pairwise interaction model provides an accurate description of in vivo transcription factor binding sites.一种通用的成对相互作用模型能够准确描述体内转录因子结合位点。
PLoS One. 2014 Jun 13;9(6):e99015. doi: 10.1371/journal.pone.0099015. eCollection 2014.
9
Unbiased pangenome graphs.无偏泛基因组图。
Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac743.
10
A Bayesian search for transcriptional motifs.贝叶斯搜索转录基序。
PLoS One. 2010 Nov 18;5(11):e13897. doi: 10.1371/journal.pone.0013897.

引用本文的文献

1
Haplotype Matching with GBWT for Pangenome Graphs.用于泛基因组图的基于广义布隆游走树的单倍型匹配
bioRxiv. 2025 Feb 7:2025.02.03.634410. doi: 10.1101/2025.02.03.634410.
2
methylGrapher: genome-graph-based processing of DNA methylation data from whole genome bisulfite sequencing.甲基化图谱绘制工具:基于基因组图谱处理全基因组亚硫酸氢盐测序的DNA甲基化数据
Nucleic Acids Res. 2025 Jan 24;53(3). doi: 10.1093/nar/gkaf028.
3
An overview of recent technological developments in bovine genomics.牛基因组学近期技术发展概述。

本文引用的文献

1
Personalized and graph genomes reveal missing signal in epigenomic data.个性化和图形基因组揭示了表观基因组数据中的缺失信号。
Genome Biol. 2020 May 25;21(1):124. doi: 10.1186/s13059-020-02038-8.
2
ClinVar: improvements to accessing data.ClinVar:访问数据的改进。
Nucleic Acids Res. 2020 Jan 8;48(D1):D835-D844. doi: 10.1093/nar/gkz972.
3
JASPAR 2020: update of the open-access database of transcription factor binding profiles.JASPAR 2020:转录因子结合谱开放获取数据库的更新。
Vet Anim Sci. 2024 Jul 23;25:100382. doi: 10.1016/j.vas.2024.100382. eCollection 2024 Sep.
4
SEESAW: detecting isoform-level allelic imbalance accounting for inferential uncertainty.SEESAW:检测等位基因失衡的异构体水平,同时考虑推理不确定性。
Genome Biol. 2023 Jul 12;24(1):165. doi: 10.1186/s13059-023-03003-x.
5
The Bovine Pangenome Consortium: democratizing production and accessibility of genome assemblies for global cattle breeds and other bovine species.牛泛基因组联盟:为全球牛品种和其他牛种的基因组组装提供民主化的生产和获取途径。
Genome Biol. 2023 Jun 19;24(1):139. doi: 10.1186/s13059-023-02975-0.
6
A survey on algorithms to characterize transcription factor binding sites.一种用于刻画转录因子结合位点的算法研究综述。
Brief Bioinform. 2023 May 19;24(3). doi: 10.1093/bib/bbad156.
7
A cattle graph genome incorporating global breed diversity.一个纳入全球品种多样性的牛基因组图谱。
Nat Commun. 2022 Feb 17;13(1):910. doi: 10.1038/s41467-022-28605-0.
Nucleic Acids Res. 2020 Jan 8;48(D1):D87-D92. doi: 10.1093/nar/gkz1001.
4
UCSC Genome Browser enters 20th year.UCSC Genome Browser 迎来 20 周年。
Nucleic Acids Res. 2020 Jan 8;48(D1):D756-D761. doi: 10.1093/nar/gkz1012.
5
Haplotype-aware graph indexes.单体型感知图索引。
Bioinformatics. 2020 Jan 15;36(2):400-407. doi: 10.1093/bioinformatics/btz575.
6
Variation graph toolkit improves read mapping by representing genetic variation in the reference.变异图谱工具包通过表示参考中的遗传变异来提高读映射质量。
Nat Biotechnol. 2018 Oct;36(9):875-879. doi: 10.1038/nbt.4227. Epub 2018 Aug 20.
7
Mutation hotspots at CTCF binding sites coupled to chromosomal instability in gastrointestinal cancers.CTCF 结合位点的突变热点与胃肠道癌症中的染色体不稳定性相关联。
Nat Commun. 2018 Apr 18;9(1):1520. doi: 10.1038/s41467-018-03828-2.
8
Superbubbles, Ultrabubbles, and Cacti.超级气泡、超气泡与仙人掌。
J Comput Biol. 2018 Jul;25(7):649-663. doi: 10.1089/cmb.2017.0251. Epub 2018 Feb 20.
9
The Encyclopedia of DNA elements (ENCODE): data portal update.《DNA 元件百科全书》(ENCODE):数据门户更新。
Nucleic Acids Res. 2018 Jan 4;46(D1):D794-D801. doi: 10.1093/nar/gkx1081.
10
A graph extension of the positional Burrows-Wheeler transform and its applications.位置布罗算法变换的图形扩展及其应用
Algorithms Mol Biol. 2017 Jul 11;12:18. doi: 10.1186/s13015-017-0109-9. eCollection 2017.