Alay Mustafa Tarık
Department of Medical Genetics, Ankara Etlik City Hospital, Ankara, Turkey.
Sci Rep. 2025 Mar 17;15(1):9054. doi: 10.1038/s41598-025-94142-7.
There is a great discrepancy between the clinical categorization of MEFV gene variants and in silico tool predictions. In this study, we developed a seven-tier classification system for MEFV missense variants of unknown significance and recommended a generalized pipeline for other gene classifications. We extracted 12,017 human MEFV gene variants from the Ensembl database. After extraction, we detected 6034 missense variants. In the next step, we selected 42 in silico tools for our classification model. We determined the optimal value via the scores from three in silico tools. For the implementation of machine learning methods, we used two bagging methods and two boosting methods. After predicting known variants, we applied our model to 5507 variants of unknown significance. In the final stage, we applied the developed framework to the entire dataset to rigorously evaluate its classification performance and validate its potential clinical utility. The XGBoost model achieved the highest accuracy at 0.9882 (± 0.0295), followed by Extremely Randomized Trees (0.9835 ± 0.0335), Random Forest (0.9788 ± 0.0158), and AdaBoost (0.9671 ± 0.0815). Following the refinement of the dataset and the introduction of a novel classification and clustering methodology, the proportion of known variants increased from 6.9 to 29.4%, marking a 4.3-fold relative improvement. Furthermore, we identified two novel hotspot regions and one tolerant site, offering valuable insights into the functional structure of the pyrin protein. Rigid and adaptive classifiers offer an innovative framework for VOUS classification, integrating a grayscale interpretation system with cutting-edge in silico tools and machine learning algorithms. This approach not only improves the accuracy of MEFV gene variant classification but also identifies new hotspot regions for functional studies, paving the way for scalable applications to other genes and might contribute to advancing precision genomic medicine in the future.
MEFV基因变异的临床分类与计算机工具预测之间存在很大差异。在本研究中,我们为意义不明的MEFV错义变异开发了一个七级分类系统,并为其他基因分类推荐了一个通用流程。我们从Ensembl数据库中提取了12017个人类MEFV基因变异。提取后,我们检测到6034个错义变异。在下一步中,我们为分类模型选择了42种计算机工具。我们通过三种计算机工具的评分确定了最佳值。为了实施机器学习方法,我们使用了两种装袋方法和两种提升方法。在预测已知变异后,我们将模型应用于5507个意义不明的变异。在最后阶段,我们将开发的框架应用于整个数据集,以严格评估其分类性能并验证其潜在的临床效用。XGBoost模型的准确率最高,为0.9882(±0.0295),其次是极端随机树(0.9835±0.0335)、随机森林(0.9788±0.0158)和AdaBoost(0.9671±0.0815)。在对数据集进行优化并引入新的分类和聚类方法后,已知变异的比例从6.9%增加到29.4%,相对提高了4.3倍。此外,我们确定了两个新的热点区域和一个耐受位点,为吡啉蛋白的功能结构提供了有价值的见解。刚性和自适应分类器为意义不明变异的分类提供了一个创新框架,将灰度解释系统与前沿的计算机工具和机器学习算法相结合。这种方法不仅提高了MEFV基因变异分类的准确性,还识别了用于功能研究的新热点区域,为其他基因的可扩展应用铺平了道路,并可能有助于推动未来的精准基因组医学发展。