MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, United Kingdom.
PLoS One. 2024 Aug 22;19(8):e0307312. doi: 10.1371/journal.pone.0307312. eCollection 2024.
Many dominant genetic disorders result from protein-altering mutations, acting primarily through dominant-negative (DN), gain-of-function (GOF), and loss-of-function (LOF) mechanisms. Deciphering the mechanisms by which dominant diseases exert their effects is often experimentally challenging and resource intensive, but is essential for developing appropriate therapeutic approaches. Diseases that arise via a LOF mechanism are more amenable to be treated by conventional gene therapy, whereas DN and GOF mechanisms may require gene editing or targeting by small molecules. Moreover, pathogenic missense mutations that act via DN and GOF mechanisms are more difficult to identify than those that act via LOF using nearly all currently available variant effect predictors. Here, we introduce a tripartite statistical model made up of support vector machine binary classifiers trained to predict whether human protein coding genes are likely to be associated with DN, GOF, or LOF molecular disease mechanisms. We test the utility of the predictions by examining biologically and clinically meaningful properties known to be associated with the mechanisms. Our results strongly support that the models are able to generalise on unseen data and offer insight into the functional attributes of proteins associated with different mechanisms. We hope that our predictions will serve as a springboard for researchers studying novel variants and those of uncertain clinical significance, guiding variant interpretation strategies and experimental characterisation. Predictions for the human UniProt reference proteome are available at https://osf.io/z4dcp/.
许多显性遗传疾病是由改变蛋白质的突变引起的,这些突变主要通过显性负(DN)、功能获得(GOF)和功能丧失(LOF)机制发挥作用。解析显性疾病发挥作用的机制通常具有实验挑战性且资源密集,但对于开发适当的治疗方法至关重要。通过 LOF 机制引起的疾病更适合通过传统的基因治疗来治疗,而 DN 和 GOF 机制可能需要基因编辑或小分子靶向。此外,与通过 LOF 机制起作用的致病性错义突变相比,通过 DN 和 GOF 机制起作用的致病性错义突变使用几乎所有现有的变体效应预测器更难以识别。在这里,我们引入了一个由支持向量机二进制分类器组成的三部分统计模型,这些分类器经过训练可预测人类蛋白质编码基因是否可能与 DN、GOF 或 LOF 分子疾病机制相关联。我们通过检查与机制相关的具有生物学和临床意义的已知属性来测试预测的效用。我们的结果强烈支持模型能够对未见数据进行泛化,并深入了解与不同机制相关的蛋白质的功能属性。我们希望我们的预测能够为研究新型变体和不确定临床意义的变体的研究人员提供一个起点,指导变异解释策略和实验表征。人类 UniProt 参考蛋白质组的预测结果可在 https://osf.io/z4dcp/ 上获得。