MRC Molecular Haematology Unit, MRC Weatherall Institute of Molecular Medicine, Oxford OX3 9DS, United Kingdom.
Computational Biology Research Group, MRC Weatherall Institute of Molecular Medicine, Oxford OX3 9DS, United Kingdom.
Genome Res. 2017 Oct;27(10):1730-1742. doi: 10.1101/gr.220202.117. Epub 2017 Sep 13.
In the era of genome-wide association studies (GWAS) and personalized medicine, predicting the impact of single nucleotide polymorphisms (SNPs) in regulatory elements is an important goal. Current approaches to determine the potential of regulatory SNPs depend on inadequate knowledge of cell-specific DNA binding motifs. Here, we present Sasquatch, a new computational approach that uses DNase footprint data to estimate and visualize the effects of noncoding variants on transcription factor binding. Sasquatch performs a comprehensive -mer-based analysis of DNase footprints to determine any -mer's potential for protein binding in a specific cell type and how this may be changed by sequence variants. Therefore, Sasquatch uses an unbiased approach, independent of known transcription factor binding sites and motifs. Sasquatch only requires a single DNase-seq data set per cell type, from any genotype, and produces consistent predictions from data generated by different experimental procedures and at different sequence depths. Here we demonstrate the effectiveness of Sasquatch using previously validated functional SNPs and benchmark its performance against existing approaches. Sasquatch is available as a versatile webtool incorporating publicly available data, including the human ENCODE collection. Thus, Sasquatch provides a powerful tool and repository for prioritizing likely regulatory SNPs in the noncoding genome.
在全基因组关联研究(GWAS)和个性化医学的时代,预测调控元件中单核苷酸多态性(SNPs)的影响是一个重要目标。目前,确定调控 SNP 潜力的方法依赖于对细胞特异性 DNA 结合基序的了解不足。在这里,我们提出了 Sasquatch,这是一种新的计算方法,它使用 DNase 足迹数据来估计和可视化非编码变异对转录因子结合的影响。Sasquatch 对 DNase 足迹进行了全面的 -mer 分析,以确定特定细胞类型中任何 -mer 潜在的蛋白质结合能力,以及序列变异如何改变这种结合能力。因此,Sasquatch 采用了一种无偏的方法,独立于已知的转录因子结合位点和基序。Sasquatch 只需要每个细胞类型的单个 DNase-seq 数据集,无论基因型如何,并且可以从不同的实验程序和不同的序列深度生成的数据中得出一致的预测。在这里,我们使用先前验证的功能 SNP 展示了 Sasquatch 的有效性,并将其性能与现有方法进行了基准测试。Sasquatch 作为一个通用的网络工具,包含了公开可用的数据,包括人类 ENCODE 集合。因此,Sasquatch 为优先考虑非编码基因组中可能的调控 SNP 提供了一个强大的工具和存储库。