Suppr超能文献

深度神经网络提高乳腺癌多基因风险评分的估计。

Deep neural network improves the estimation of polygenic risk scores for breast cancer.

机构信息

School of Computer Science, University of Oklahoma, Norman, OK, USA.

Department of Microbiology and Plant Biology, University of Oklahoma, Norman, OK, USA.

出版信息

J Hum Genet. 2021 Apr;66(4):359-369. doi: 10.1038/s10038-020-00832-7. Epub 2020 Oct 2.

Abstract

Polygenic risk scores (PRS) estimate the genetic risk of an individual for a complex disease based on many genetic variants across the whole genome. In this study, we compared a series of computational models for estimation of breast cancer PRS. A deep neural network (DNN) was found to outperform alternative machine learning techniques and established statistical algorithms, including BLUP, BayesA, and LDpred. In the test cohort with 50% prevalence, the Area Under the receiver operating characteristic Curve (AUC) were 67.4% for DNN, 64.2% for BLUP, 64.5% for BayesA, and 62.4% for LDpred. BLUP, BayesA, and LPpred all generated PRS that followed a normal distribution in the case population. However, the PRS generated by DNN in the case population followed a bimodal distribution composed of two normal distributions with distinctly different means. This suggests that DNN was able to separate the case population into a high-genetic-risk case subpopulation with an average PRS significantly higher than the control population and a normal-genetic-risk case subpopulation with an average PRS similar to the control population. This allowed DNN to achieve 18.8% recall at 90% precision in the test cohort with 50% prevalence, which can be extrapolated to 65.4% recall at 20% precision in a general population with 12% prevalence. Interpretation of the DNN model identified salient variants that were assigned insignificant p values by association studies, but were important for DNN prediction. These variants may be associated with the phenotype through nonlinear relationships.

摘要

多基因风险评分(PRS)基于全基因组中许多遗传变异来估计个体患复杂疾病的遗传风险。在本研究中,我们比较了一系列用于估计乳腺癌 PRS 的计算模型。发现深度神经网络(DNN)优于替代机器学习技术和已建立的统计算法,包括 BLUP、BayesA 和 LDpred。在具有 50%患病率的测试队列中,DNN 的接收器操作特征曲线下面积(AUC)为 67.4%,BLUP 为 64.2%,BayesA 为 64.5%,LDpred 为 62.4%。BLUP、BayesA 和 LPpred 生成的 PRS 在病例人群中均遵循正态分布。然而,DNN 在病例人群中生成的 PRS 遵循双峰分布,由两个均值明显不同的正态分布组成。这表明 DNN 能够将病例人群分为具有明显高于对照人群平均 PRS 的高遗传风险病例亚群和具有与对照人群相似平均 PRS 的正常遗传风险病例亚群。这使得 DNN 能够在具有 50%患病率的测试队列中以 90%的精度实现 18.8%的召回率,这可以外推到在具有 12%患病率的一般人群中以 20%的精度实现 65.4%的召回率。对 DNN 模型的解释确定了关联研究赋予不显著 p 值但对 DNN 预测很重要的显著变体。这些变体可能通过非线性关系与表型相关。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验