Department of Obstetrics and Gynecology, Columbia University Irving Medical Center, New York, NY, USA.
Division of Pulmonary Medicine, Allergy and Immunology, Department of Pediatrics, Children's Hospital of Pittsburgh of UPMC, University of Pittsburgh, Pittsburgh, PA, USA.
Transl Vis Sci Technol. 2021 Feb 5;10(2):29. doi: 10.1167/tvst.10.2.29.
Because age-related macular degeneration (AMD) is a progressive disorder and advanced AMD is currently hard to cure, an accurate and informative prediction of a person's AMD risk using genetic information is desirable for early diagnosis and potential individualized clinical management. The objective of this study was to develop and validate novel prediction models for AMD risk using large genome-wide association studies datasets with different machine learning approaches.
Genotype data from 32,215 Caucasian individuals with age of ≥50 years from the International AMD Genomics Consortium in dbGaP were used to establish and test prediction models for AMD risk. Four different machine learning approaches-neural network, lasso regression, support vector machine, and random forest-were implemented. A standard logistic regression model using a genetic risk score was also considered.
All machine learning-based methods achieved satisfactory performance for predicting advanced AMD cases (vs. normal controls) (area under the curve = 0.81-0.82, Brier score = 0.17-0.18 in a separate test dataset) and any stage AMD (vs. normal controls) (area under the curve = 0.78-0.79, Brier score = 0.18-0.20 in a separate test dataset). The prediction performance was further validated in an independent dataset of 783 subjects from UK Biobank (area under the curve = 0.67).
By applying multiple state-of-art machine learning approaches on large AMD genome-wide association studies datasets, the predictive models we established can provide an accurate estimation of an individual's AMD risk profile based on genetic information along with age. The online prediction interface is available at: https://yanq.shinyapps.io/no_vs_amd_NN/.
The accurate and individualized risk prediction model interface will greatly improve early diagnosis and enhance tailored clinical management of AMD.
由于年龄相关性黄斑变性(AMD)是一种进行性疾病,而晚期 AMD 目前难以治愈,因此使用遗传信息准确且有效地预测个体的 AMD 风险,对于早期诊断和潜在的个体化临床管理非常重要。本研究的目的是使用不同的机器学习方法,基于大型全基因组关联研究数据集开发和验证用于 AMD 风险预测的新型预测模型。
使用国际 AMD 基因组学联盟在 dbGaP 中的 32215 名年龄≥50 岁的白种人个体的基因型数据,建立和测试 AMD 风险预测模型。实施了四种不同的机器学习方法——神经网络、套索回归、支持向量机和随机森林。还考虑了使用遗传风险评分的标准逻辑回归模型。
所有基于机器学习的方法在预测晚期 AMD 病例(与正常对照组相比)(验证数据集的曲线下面积为 0.81-0.82,Brier 评分 0.17-0.18)和任何阶段 AMD(与正常对照组相比)(验证数据集的曲线下面积为 0.78-0.79,Brier 评分 0.18-0.20)方面均取得了令人满意的性能。在来自英国生物库的 783 名受试者的独立数据集(曲线下面积为 0.67)中进一步验证了该预测性能。
通过在大型 AMD 全基因组关联研究数据集上应用多种最先进的机器学习方法,我们建立的预测模型可以根据遗传信息和年龄为个体的 AMD 风险概况提供准确的估计。在线预测接口可在:https://yanq.shinyapps.io/no_vs_amd_NN/ 获得。
杨乾