Ecole Normale Supérieure, PSL Research University, CNRS, Inserm, Institut de Biologie de l'Ecole Normale Supérieure (IBENS), Paris, France.
PLoS Genet. 2022 Apr 29;18(4):e1010191. doi: 10.1371/journal.pgen.1010191. eCollection 2022 Apr.
Whole genome sequencing is increasingly used to diagnose medical conditions of genetic origin. While both coding and non-coding DNA variants contribute to a wide range of diseases, most patients who receive a WGS-based diagnosis today harbour a protein-coding mutation. Functional interpretation and prioritization of non-coding variants represents a persistent challenge, and disease-causing non-coding variants remain largely unidentified. Depending on the disease, WGS fails to identify a candidate variant in 20-80% of patients, severely limiting the usefulness of sequencing for personalised medicine. Here we present FINSURF, a machine-learning approach to predict the functional impact of non-coding variants in regulatory regions. FINSURF outperforms state-of-the-art methods, owing in particular to optimized control variants selection during training. In addition to ranking candidate variants, FINSURF breaks down the score for each variant into contributions from individual annotations, facilitating the evaluation of their functional relevance. We applied FINSURF to a diverse set of 30 diseases with described causative non-coding mutations, and correctly identified the disease-causative non-coding variant within the ten top hits in 22 cases. FINSURF is implemented as an online server to as well as custom browser tracks, and provides a quick and efficient solution to prioritize candidate non-coding variants in realistic clinical settings.
全基因组测序越来越多地用于诊断遗传起源的医学病症。虽然编码和非编码 DNA 变体都导致了广泛的疾病,但今天接受基于 WGS 诊断的大多数患者都携带有蛋白质编码突变。非编码变体的功能解释和优先级排序仍然是一个持续存在的挑战,导致疾病的非编码变体在很大程度上仍未被识别。根据疾病的不同,WGS 在 20-80%的患者中无法识别候选变体,这严重限制了测序在个性化医疗中的应用。在这里,我们提出了 FINSURF,这是一种用于预测调控区域中非编码变体功能影响的机器学习方法。FINSURF 优于最先进的方法,特别是在训练过程中优化了对照变体的选择。除了对候选变体进行排序外,FINSURF 还将每个变体的得分分解为各个注释的贡献,便于评估它们的功能相关性。我们将 FINSURF 应用于一组具有描述性致病非编码突变的 30 种不同疾病,在 22 种情况下,正确地在十个最显著的命中中识别出了致病的非编码变体。FINSURF 作为一个在线服务器以及自定义浏览器轨道被实现,为在现实临床环境中优先考虑候选非编码变体提供了快速有效的解决方案。