Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan.
Department of Human Genetics, University of Michigan, Ann Arbor, Michigan.
Hum Mutat. 2019 Sep;40(9):1292-1298. doi: 10.1002/humu.23791. Epub 2019 Jun 22.
Here we present a computational model, Score of Unified Regulatory Features (SURF), that predicts functional variants in enhancer and promoter elements. SURF is trained on data from massively parallel reporter assays and predicts the effect of variants on reporter expression levels. It achieved the top performance in the Fifth Critical Assessment of Genome Interpretation "Regulation Saturation" challenge. We also show that features queried through RegulomeDB, which are direct annotations from functional genomics data, help improve prediction accuracy beyond transfer learning features from DNA sequence-based deep learning models. Some of the most important features include DNase footprints, especially when coupled with complementary ChIP-seq data. Furthermore, we found our model achieved good performance in predicting allele-specific transcription factor binding events. As an extension to the current scoring system in RegulomeDB, we expect our computational model to prioritize variants in regulatory regions, thus help the understanding of functional variants in noncoding regions that lead to disease.
在这里,我们提出了一个计算模型,即统一调控特征评分(SURF),用于预测增强子和启动子元件中的功能变体。SURF 是基于大规模平行报告基因检测数据进行训练的,可预测变体对报告基因表达水平的影响。它在第五次基因组解读“调控饱和”挑战赛中取得了最佳性能。我们还表明,通过 RegulomeDB 查询的特征(来自功能基因组学数据的直接注释)有助于在基于 DNA 序列的深度学习模型的转移学习特征之外提高预测准确性。一些最重要的特征包括 DNase 足迹,尤其是与互补的 ChIP-seq 数据结合使用时。此外,我们发现我们的模型在预测等位基因特异性转录因子结合事件方面表现良好。作为 RegulomeDB 中当前评分系统的扩展,我们期望我们的计算模型能够对调控区域中的变体进行优先级排序,从而帮助理解导致疾病的非编码区域中的功能变体。