Palheta Helber Gonzales Almeida, Gonçalves Wanderson Gonçalves, Brito Leonardo Miranda, Dos Santos Arthur Ribeiro, Dos Reis Matsumoto Marlon, Ribeiro-Dos-Santos Ândrea, de Araújo Gilderlanio Santana
Laboratory of Human and Medical Genetics, Graduate Program of Genetics and Molecular Biology, Institute of Biological Sciences, Federal University of Pará, Belém 66075-110, Brazil.
Research Center on Oncology, Graduate Program of Oncology and Medical Science, Federal University of Pará, Belém 66073-000, Brazil.
Biology (Basel). 2022 Mar 31;11(4):538. doi: 10.3390/biology11040538.
ClinVar is a web platform that stores ∼789,000 genetic associations with complex diseases. A partial set of these cataloged genetic associations has challenged clinicians and geneticists, often leading to conflicting interpretations or uncertain clinical impact significance. In this study, we addressed the (re)classification of genetic variants by AmazonForest, which is a random-forest-based pathogenicity metaprediction model that works by combining functional impact data from eight prediction tools. We evaluated the performance of representation learning algorithms such as autoencoders to propose a better strategy. All metaprediction models were trained with ClinVar data, and genetic variants were annotated with eight functional impact predictors cataloged with SnpEff/SnpSift. AmazonForest implements the best random forest model with a one hot data-encoding strategy, which shows an Area Under ROC Curve of ≥0.93. AmazonForest was employed for pathogenicity prediction of a set of ∼101,000 genetic variants of uncertain significance or conflict of interpretation. Our findings revealed ∼24,000 variants with high pathogenic probability (RFprob≥0.9). In addition, we show results for Alzheimer's Disease as a demonstration of its application in clinical interpretation of genetic variants in complex diseases. Lastly, AmazonForest is available as a web tool and R object that can be loaded to perform pathogenicity predictions.
ClinVar是一个网络平台,存储了约78.9万个与复杂疾病相关的基因关联信息。这些编入目录的基因关联信息的一部分给临床医生和遗传学家带来了挑战,常常导致相互矛盾的解释或不确定的临床影响意义。在本研究中,我们探讨了利用AmazonForest对基因变异进行(重新)分类,AmazonForest是一种基于随机森林的致病性元预测模型,它通过结合来自八个预测工具的功能影响数据来工作。我们评估了诸如自动编码器等表示学习算法的性能,以提出更好的策略。所有元预测模型均使用ClinVar数据进行训练,基因变异用SnpEff/SnpSift编目的八个功能影响预测器进行注释。AmazonForest采用一种独热数据编码策略实现了最佳随机森林模型,其ROC曲线下面积≥0.93。AmazonForest被用于对一组约10.1万个意义不确定或解释存在冲突的基因变异进行致病性预测。我们的研究结果揭示了约2.4万个具有高致病概率(RFprob≥0.9)的变异。此外,我们展示了阿尔茨海默病的结果,以证明其在复杂疾病基因变异临床解释中的应用。最后,AmazonForest作为一个网络工具和R对象可用,可加载以进行致病性预测。