Department of Systems Biology, Columbia University, New York, NY, 10032, USA.
Department of Pediatrics, Columbia University, New York, NY, 10032, USA.
Nat Commun. 2018 May 30;9(1):2138. doi: 10.1038/s41467-018-04552-7.
Haploinsufficiency is a major mechanism of genetic risk in developmental disorders. Accurate prediction of haploinsufficient genes is essential for prioritizing and interpreting deleterious variants in genetic studies. Current methods based on mutation intolerance in population data suffer from inadequate power for genes with short transcripts. Here we show haploinsufficiency is strongly associated with epigenomic patterns, and develop a computational method (Episcore) to predict haploinsufficiency leveraging epigenomic data from a broad range of tissue and cell types by machine learning methods. Based on data from recent exome sequencing studies on developmental disorders, Episcore achieves better performance in prioritizing likely-gene-disrupting (LGD) de novo variants than current methods. We further show that Episcore is less-biased by gene size, and complementary to mutation intolerance metrics for prioritizing LGD variants. Our approach enables new applications of epigenomic data and facilitates discovery and interpretation of novel risk variants implicated in developmental disorders.
单倍不足是发育障碍遗传风险的主要机制。准确预测单倍不足基因对于在遗传研究中优先考虑和解释有害变异至关重要。目前基于群体数据中突变不耐受性的方法对于转录本较短的基因的效力不足。在这里,我们表明单倍不足与表观基因组模式密切相关,并开发了一种计算方法(Episcore),通过机器学习方法利用广泛的组织和细胞类型的表观基因组数据来预测单倍不足。基于发育障碍的外显子测序研究的最新数据,Episcore 在优先考虑可能导致基因破坏的(LGD)新生变异方面的性能优于现有方法。我们进一步表明,Episcore 受基因大小的影响较小,并且与突变不耐受性指标互补,可用于优先考虑 LGD 变异。我们的方法使表观基因组数据的新应用成为可能,并有助于发现和解释发育障碍中涉及的新风险变异。