VEPAD - 使用机器学习预测与阿尔茨海默病相关变异的影响。

VEPAD - Predicting the effect of variants associated with Alzheimer's disease using machine learning.

机构信息

Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, 600036, India.

Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, 600036, India; School of Computing, Tokyo Tech World Research Hub Initiative (WRHI), Institute of Innovative Research, Tokyo Institute of Technology, Midori-ku, Kanagawa, 226-8503, Yokohama, Japan.

出版信息

Comput Biol Med. 2020 Sep;124:103933. doi: 10.1016/j.compbiomed.2020.103933. Epub 2020 Aug 5.

DOI:10.1016/j.compbiomed.2020.103933

PMID:32828070

Abstract

INTRODUCTION

Alzheimer's disease (AD) is a complex and heterogeneous disease that affects neuronal cells over time and it is prevalent among all neurodegenerative diseases. Next Generation Sequencing (NGS) techniques are widely used for developing high-throughput screening methods to identify biomarkers and variants, which help early diagnosis and treatments.

OBJECTIVE

The primary purpose of this study is to develop a classification model using machine learning for predicting the deleterious effect of variants with respect to AD.

METHODS

We have constructed a set of 20,401 deleterious and 37,452 control variants from Genome-Wide Association Study (GWAS) and Genotype-Tissue Expression (GTEx) portals, respectively. Recursive feature elimination using cross-validation (RFECV) followed by a forward feature selection method was utilized to select the important features and a random forest classifier was used for distinguishing between deleterious and neutral variants.

RESULTS

Our method showed an accuracy of 81.21% on 10-fold cross-validation and 70.63% on a test set of 5785 variants. The same test set was used to compare the performance of CADD and FATHMM and their accuracies are in the range of 54%-62%.

CONCLUSION

Our model is freely available as the Variant Effect Predictor for Alzheimer's Disease (VEPAD) at http://web.iitm.ac.in/bioinfo2/vepad/. VEPAD can be used to predict the effect of new variants associated with AD.

摘要

简介

阿尔茨海默病（AD）是一种复杂的异质疾病，会随着时间的推移影响神经元细胞，是所有神经退行性疾病中最常见的一种。下一代测序（NGS）技术被广泛用于开发高通量筛选方法，以识别生物标志物和变体，这有助于早期诊断和治疗。

目的

本研究的主要目的是使用机器学习开发一种分类模型，用于预测 AD 相关变体的有害影响。

方法

我们分别从全基因组关联研究（GWAS）和基因型组织表达（GTEx）门户构建了一组 20401 个有害变体和 37452 个对照变体。使用交叉验证（RFECV）的递归特征消除和前向特征选择方法来选择重要特征，并使用随机森林分类器来区分有害和中性变体。

结果

我们的方法在 10 折交叉验证上的准确率为 81.21%，在 5785 个变体的测试集上的准确率为 70.63%。同一测试集用于比较 CADD 和 FATHMM 的性能，它们的准确率在 54%-62%之间。

结论

我们的模型作为阿尔茨海默病变体效应预测器（VEPAD）在 http://web.iitm.ac.in/bioinfo2/vepad/ 上免费提供。VEPAD 可用于预测与 AD 相关的新变体的效应。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

VEPAD - 使用机器学习预测与阿尔茨海默病相关变异的影响。

VEPAD - Predicting the effect of variants associated with Alzheimer's disease using machine learning.

机构信息

出版信息

INTRODUCTION

OBJECTIVE

METHODS

RESULTS

CONCLUSION

简介

目的

方法

结果

结论

相似文献

引用本文的文献

VEPAD - 使用机器学习预测与阿尔茨海默病相关变异的影响。

VEPAD - Predicting the effect of variants associated with Alzheimer's disease using machine learning.

机构信息

出版信息

INTRODUCTION

OBJECTIVE

METHODS

RESULTS

CONCLUSION

简介

目的

方法

结果

结论

相似文献

引用本文的文献