Suppr超能文献

基于泛基因组图的变体和单倍型感知 motif 扫描

GRAFIMO: Variant and haplotype aware motif scanning on pangenome graphs.

机构信息

Computer Science Department, University of Verona, Verona, Italy.

University of Tennessee Health Science Center, Memphis, Tennessee, United States of America.

出版信息

PLoS Comput Biol. 2021 Sep 27;17(9):e1009444. doi: 10.1371/journal.pcbi.1009444. eCollection 2021 Sep.

Abstract

Transcription factors (TFs) are proteins that promote or reduce the expression of genes by binding short genomic DNA sequences known as transcription factor binding sites (TFBS). While several tools have been developed to scan for potential occurrences of TFBS in linear DNA sequences or reference genomes, no tool exists to find them in pangenome variation graphs (VGs). VGs are sequence-labelled graphs that can efficiently encode collections of genomes and their variants in a single, compact data structure. Because VGs can losslessly compress large pangenomes, TFBS scanning in VGs can efficiently capture how genomic variation affects the potential binding landscape of TFs in a population of individuals. Here we present GRAFIMO (GRAph-based Finding of Individual Motif Occurrences), a command-line tool for the scanning of known TF DNA motifs represented as Position Weight Matrices (PWMs) in VGs. GRAFIMO extends the standard PWM scanning procedure by considering variations and alternative haplotypes encoded in a VG. Using GRAFIMO on a VG based on individuals from the 1000 Genomes project we recover several potential binding sites that are enhanced, weakened or missed when scanning only the reference genome, and which could constitute individual-specific binding events. GRAFIMO is available as an open-source tool, under the MIT license, at https://github.com/pinellolab/GRAFIMO and https://github.com/InfOmics/GRAFIMO.

摘要

转录因子(TFs)是通过结合短的基因组 DNA 序列(称为转录因子结合位点(TFBS))来促进或降低基因表达的蛋白质。虽然已经开发了几种工具来扫描线性 DNA 序列或参考基因组中潜在的 TFBS 出现情况,但没有工具可用于在泛基因组变异图(VG)中找到它们。VG 是序列标记的图,可以在单个紧凑的数据结构中有效地对基因组及其变体的集合进行编码。由于 VG 可以无损地压缩大型泛基因组,因此在 VG 中进行 TFBS 扫描可以有效地捕获基因组变异如何影响个体群体中 TF 的潜在结合景观。在这里,我们介绍了 GRAFIMO(基于图的个体基序出现发现),这是一种命令行工具,用于在 VG 中扫描表示为位置权重矩阵(PWM)的已知 TF DNA 基序。GRAFIMO 通过考虑在 VG 中编码的变体和替代单倍型来扩展标准 PWM 扫描过程。在基于 1000 个基因组项目个体的 VG 上使用 GRAFIMO,我们恢复了几个潜在的结合位点,当仅扫描参考基因组时,这些结合位点会被增强、减弱或错过,并且可能构成个体特异性结合事件。GRAFIMO 是一个开源工具,根据麻省理工学院的许可证提供,可在 https://github.com/pinellolab/GRAFIMOhttps://github.com/InfOmics/GRAFIMO 上获得。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验