Department of Genetics.
Department of Health Research & Policy.
Bioinformatics. 2017 Dec 15;33(24):3895-3901. doi: 10.1093/bioinformatics/btx534.
Interpreting genetic variation in noncoding regions of the genome is an important challenge for personal genome analysis. One mechanism by which noncoding single nucleotide variants (SNVs) influence downstream phenotypes is through the regulation of gene expression. Methods to predict whether or not individual SNVs are likely to regulate gene expression would aid interpretation of variants of unknown significance identified in whole-genome sequencing studies.
We developed FIRE (Functional Inference of Regulators of Expression), a tool to score both noncoding and coding SNVs based on their potential to regulate the expression levels of nearby genes. FIRE consists of 23 random forests trained to recognize SNVs in cis-expression quantitative trait loci (cis-eQTLs) using a set of 92 genomic annotations as predictive features. FIRE scores discriminate cis-eQTL SNVs from non-eQTL SNVs in the training set with a cross-validated area under the receiver operating characteristic curve (AUC) of 0.807, and discriminate cis-eQTL SNVs shared across six populations of different ancestry from non-eQTL SNVs with an AUC of 0.939. FIRE scores are also predictive of cis-eQTL SNVs across a variety of tissue types.
FIRE scores for genome-wide SNVs in hg19/GRCh37 are available for download at https://sites.google.com/site/fireregulatoryvariation/.
Supplementary data are available at Bioinformatics online.
解释基因组非编码区域的遗传变异是个人基因组分析的一个重要挑战。非编码单核苷酸变异(SNV)影响下游表型的一个机制是通过基因表达的调节。预测个体 SNV 是否可能调节基因表达的方法将有助于解释全基因组测序研究中发现的未知意义的变异。
我们开发了 FIRE(表达调控因子的功能推断),这是一种基于其调节附近基因表达水平的潜力来对非编码和编码 SNV 进行评分的工具。FIRE 由 23 个随机森林组成,这些森林经过训练,使用 92 个基因组注释作为预测特征,在顺式表达数量性状基因座(cis-eQTLs)中识别 SNV。FIRE 评分在训练集中区分 cis-eQTL SNV 与非-eQTL SNV 的交叉验证接收者操作特征曲线(AUC)为 0.807,区分跨六个不同祖先人群共享的 cis-eQTL SNV 与非-eQTL SNV 的 AUC 为 0.939。FIRE 评分也可预测各种组织类型中的 cis-eQTL SNV。
可在 https://sites.google.com/site/fireregulatoryvariation/ 下载 hg19/GRCh37 中全基因组 SNV 的 FIRE 评分。
补充数据可在生物信息学在线获得。