CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), Delhi, India.
Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India.
PLoS One. 2024 May 17;19(5):e0303787. doi: 10.1371/journal.pone.0303787. eCollection 2024.
Advances in Next Generation Sequencing have made rapid variant discovery and detection widely accessible. To facilitate a better understanding of the nature of these variants, American College of Medical Genetics and Genomics and the Association of Molecular Pathologists (ACMG-AMP) have issued a set of guidelines for variant classification. However, given the vast number of variants associated with any disorder, it is impossible to manually apply these guidelines to all known variants. Machine learning methodologies offer a rapid way to classify large numbers of variants, as well as variants of uncertain significance as either pathogenic or benign. Here we classify ATP7B genetic variants by employing ML and AI algorithms trained on our well-annotated WilsonGen dataset.
We have trained and validated two algorithms: TabNet and XGBoost on a high-confidence dataset of manually annotated, ACMG & AMP classified variants of the ATP7B gene associated with Wilson's Disease.
Using an independent validation dataset of ACMG & AMP classified variants, as well as a patient set of functionally validated variants, we showed how both algorithms perform and can be used to classify large numbers of variants in clinical as well as research settings.
We have created a ready to deploy tool, that can classify variants linked with Wilson's disease as pathogenic or benign, which can be utilized by both clinicians and researchers to better understand the disease through the nature of genetic variants associated with it.
下一代测序技术的进步使得快速发现和检测变异变得广泛可用。为了更好地理解这些变异的性质,美国医学遗传学与基因组学学会和分子病理学家协会(ACMG-AMP)发布了一套变异分类指南。然而,由于与任何疾病相关的变异数量众多,不可能手动将这些指南应用于所有已知的变异。机器学习方法提供了一种快速分类大量变异的方法,以及不确定意义的变异,将其分为致病性或良性。在这里,我们通过在我们精心注释的 WilsonGen 数据集上训练机器学习和人工智能算法来对 ATP7B 基因的遗传变异进行分类。
我们已经在一个高度置信的 ATP7B 基因的手动注释、ACMG 和 AMP 分类变异数据集上训练和验证了两种算法:TabNet 和 XGBoost。
使用 ACMG 和 AMP 分类变异的独立验证数据集,以及一组经过功能验证的患者变异,我们展示了这两种算法的性能,以及如何将其用于在临床和研究环境中对大量变异进行分类。
我们创建了一个可随时部署的工具,可以将与威尔逊病相关的变异分类为致病性或良性,临床医生和研究人员都可以使用该工具通过与疾病相关的遗传变异的性质来更好地理解疾病。