BGI-Shenzhen, Shenzhen 518083, China.
Division of Clinical Immunology at the Department of Laboratory Medicine, Karolinska Institutet at Karolinska University Hospital Huddinge, SE-141 86 Stockholm, Sweden.
Brief Bioinform. 2022 Sep 20;23(5). doi: 10.1093/bib/bbac176.
Distinguishing pathogenic variants from non-pathogenic ones remains a major challenge in clinical genetic testing of primary immunodeficiency (PID) patients. Most of the existing mutation pathogenicity prediction tools treat all mutations as homogeneous entities, ignoring the differences in characteristics of different genes, and use the same model for genes in different diseases. In this study, we developed a single nucleotide variant (SNV) pathogenicity prediction tool, Variant Impact Predictor for PIDs (VIPPID; https://mylab.shinyapps.io/VIPPID/), which was tailored for PIDs genes and used a specific model for each of the most prevalent PID known genes. It employed a Conditional Inference Forest model and utilized information of 85 features of SNVs and scores from 20 existing prediction tools. Evaluation of VIPPID showed that it had superior performance (area under the curve = 0.91) over non-specific conventional tools. In addition, we also showed that the gene-specific model outperformed the non-gene-specific models. Our study demonstrated that disease-specific and gene-specific models can improve SNV pathogenicity prediction performance. This observation supports the notion that each feature of mutations in the model can be potentially used, in a new algorithm, to investigate the characteristics and function of the encoded proteins.
在原发性免疫缺陷 (PID) 患者的临床基因检测中,区分致病性变异和非致病性变异仍然是一个主要挑战。大多数现有的突变致病性预测工具将所有突变视为同质实体,忽略了不同基因特征的差异,并且对不同疾病的基因使用相同的模型。在这项研究中,我们开发了一种单核苷酸变异 (SNV) 致病性预测工具,即用于 PID 的变异影响预测器 (VIPPID; https://mylab.shinyapps.io/VIPPID/),它专门针对 PID 基因,并为每个最常见的 PID 已知基因使用特定的模型。它采用条件推理森林模型,并利用 85 个 SNV 特征信息和 20 个现有预测工具的评分。VIPPID 的评估表明,它的性能优于非特异性常规工具 (曲线下面积=0.91)。此外,我们还表明,基因特异性模型优于非基因特异性模型。我们的研究表明,疾病特异性和基因特异性模型可以提高 SNV 致病性预测性能。这一观察结果支持了这样一种观点,即模型中每个突变特征都可以在新算法中潜在地用于研究编码蛋白的特征和功能。