Computer Science Department, University of California, Los Angeles, Los Angeles, CA, 90095, USA.
Department of Biological Chemistry, University of California, Los Angeles, Los Angeles, CA, 90095, USA.
Nat Commun. 2020 Dec 2;11(1):6168. doi: 10.1038/s41467-020-19962-9.
Annotations of evolutionary sequence constraint based on multi-species genome alignments and genome-wide maps of epigenomic marks and transcription factor binding provide important complementary information for understanding the human genome and genetic variation. Here we developed the Constrained Non-Exonic Predictor (CNEP) to quantify the evidence of each base in the genome being in an evolutionarily constrained non-exonic element from an input of over 60,000 epigenomic and transcription factor binding features. We find that the CNEP score outperforms baseline and related existing scores at predicting evolutionarily constrained non-exonic bases from such data. However, a subset of them are still not well predicted by CNEP. We developed a complementary Conservation Signature Score by CNEP (CSS-CNEP) that is predictive of those bases. We further characterize the nature of constrained non-exonic bases with low CNEP scores using additional types of information. CNEP and CSS-CNEP are resources for analyzing constrained non-exonic bases in the genome.
基于多物种基因组比对以及全基因组范围的表观遗传标记和转录因子结合图谱的进化序列约束注释,为理解人类基因组和遗传变异提供了重要的互补信息。在这里,我们开发了Constrained Non-Exonic Predictor(CNEP),从超过 60000 个表观遗传和转录因子结合特征的输入中,量化了基因组中每个碱基处于进化约束性非编码元件的证据。我们发现,与基线和相关现有评分相比,CNEP 评分在从这些数据预测进化约束性非编码碱基方面表现更好。然而,其中仍有一部分不能被 CNEP 很好地预测。我们开发了一种互补的 Conservation Signature Score by CNEP(CSS-CNEP),它可以预测这些碱基。我们使用其他类型的信息进一步描述了具有低 CNEP 评分的约束性非编码碱基的性质。CNEP 和 CSS-CNEP 是分析基因组中约束性非编码碱基的资源。