Division of Genomics and Translational Biomedicine, College of Health and Life Sciences, Hamad Bin Khalifa University, Doha, Qatar.
Hamad Dental Center, Hamad Medical Corporation, Doha, Qatar.
Physiol Genomics. 2023 Aug 1;55(8):315-323. doi: 10.1152/physiolgenomics.00033.2023. Epub 2023 Jun 19.
Identification of novel variants outpaces their clinical annotation which highlights the importance of developing accurate computational methods for risk assessment. Therefore our aim was to develop a -specific machine learning model to predict the pathogenicity of all types of variants and to apply this model and our previous specific model to assess variants of uncertain significance (VUS) among Qatari patients with breast cancer. We developed an XGBoost model that utilizes variant information such as position frequency and consequence as well as prediction scores from numerous in silico tools. We trained and tested the model with variants that were reviewed and classified by the Evidence-Based Network for the Interpretation of Germline Mutant Alleles (ENIGMA) consortium. In addition we tested the model's performance on an independent set of missense variants of uncertain significance with experimentally determined functional scores. The model performed excellently in predicting the pathogenicity of ENIGMA-classified variants (accuracy: 99.9%) and in predicting the functional consequence of the independent set of missense variants (accuracy: 93.4%). Moreover it predicted 2 115 potentially pathogenic variants among the 31 058 unreviewed variants in the exchange database. Using two -specific models we did not identify any pathogenic variants among those found in patients in Qatar but predicted four potentially pathogenic variants, which could be prioritized for functional validation.
鉴定新型变异体的速度超过了对其临床注释的速度,这凸显了开发准确的风险评估计算方法的重要性。因此,我们的目标是开发一种特定的机器学习模型,以预测所有类型的变异体的致病性,并应用该模型和我们之前的特定模型来评估卡塔尔乳腺癌患者中意义不明的变异体(VUS)。我们开发了一种 XGBoost 模型,该模型利用变异体信息,如位置频率和后果,以及来自众多计算机工具的预测评分。我们使用经过 Evidence-Based Network for the Interpretation of Germline Mutant Alleles (ENIGMA) 联盟审查和分类的变异体对模型进行了训练和测试。此外,我们还使用具有实验确定的功能评分的独立组不确定意义的错义变异体测试了模型的性能。该模型在预测 ENIGMA 分类变异体的致病性(准确性:99.9%)和预测独立组错义变异体的功能后果(准确性:93.4%)方面表现出色。此外,它预测了在 31058 个未审查的 交换数据库中的变异体中,有 2115 个可能具有致病性的变异体。使用两种特定的模型,我们没有在卡塔尔患者中发现任何致病性变异体,但预测了四个可能具有致病性的变异体,这些变异体可以优先进行功能验证。