Sobahy Turki M, Motwalli Olaa, Alazmi Meshari
IEEE/ACM Trans Comput Biol Bioinform. 2023 Jan-Feb;20(1):796-801. doi: 10.1109/TCBB.2022.3155659. Epub 2023 Feb 3.
BACKGROUND & OBJECTIVE: Genomic medicine stands to be revolutionized by understanding single nucleotide variants (SNVs) and their expression in single-gene disorders (Mendelian diseases). Computational tools can play a vital role in the exploration of such variations and their pathogenicity. Consequently, we developed the ensemble prediction tool AllelePred to identify deleterious SNVs and disease causative genes.
The model utilizes different population genetics backgrounds and restricted criteria for features selection to help generate high accuracy results. In comparison to other tools, such as Eigen, PROVEAN, and fathmm-MKL our classifier achieves higher accuracy (98%), precision (96%), F1 score (93%), and coverage (100%) for different types of coding variants. The new method was also compared against a bioinformatics analytical workflow, which uses gnomAD overall AFs (less than 1%) and CADD (scaled C-score of at least 15). Furthermore, this research highlights the stature of genetic variant sharing and curation. We accumulated a list of highly probable deleterious variants and recommended further experimental validation before medical diagnostic usage.
The ensemble prediction tool AllelePred enables increased accuracy in recognizing deleterious SNVs and the genetic determinants in real clinical data.
通过了解单核苷酸变异(SNV)及其在单基因疾病(孟德尔疾病)中的表达,基因组医学有望发生变革。计算工具在探索此类变异及其致病性方面可发挥至关重要的作用。因此,我们开发了综合预测工具AllelePred来识别有害的SNV和致病基因。
该模型利用不同的群体遗传学背景和严格的特征选择标准,以帮助生成高精度的结果。与其他工具(如Eigen、PROVEAN和fathmm-MKL)相比,我们的分类器在不同类型的编码变异中实现了更高的准确率(98%)、精确率(96%)、F1分数(93%)和覆盖率(100%)。新方法还与一种生物信息学分析工作流程进行了比较,该工作流程使用gnomAD总体等位基因频率(小于1%)和CADD(缩放后的C分数至少为15)。此外,本研究突出了遗传变异共享和整理的重要性。我们积累了一份极有可能有害的变异列表,并建议在医学诊断使用前进行进一步的实验验证。
综合预测工具AllelePred能够提高在实际临床数据中识别有害SNV和遗传决定因素的准确性。