Suppr超能文献

缺失性适应群体信息聚类(MAGIC)-套索算法:一种用于表型预测以提高遗传位点发现效能的新范式。

Missingness adapted group informed clustered (MAGIC)-LASSO: a novel paradigm for phenotype prediction to improve power for genetic loci discovery.

作者信息

Gentry Amanda Elswick, Kirkpatrick Robert M, Peterson Roseann E, Webb Bradley T

机构信息

Department of Psychiatry, Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, VA, United States.

Department of Psychiatry and Behavioral Sciences, Institute for Genomics in Health, SUNY Downstate Health Sciences University, Brooklyn, NY, United States.

出版信息

Front Genet. 2023 Jul 20;14:1162690. doi: 10.3389/fgene.2023.1162690. eCollection 2023.

Abstract

The availability of large-scale biobanks linking genetic data, rich phenotypes, and biological measures is a powerful opportunity for scientific discovery. However, real-world collections frequently have extensive missingness. While missing data prediction is possible, performance is significantly impaired by block-wise missingness inherent to many biobanks. To address this, we developed Missingness Adapted Group-wise Informed Clustered (MAGIC)-LASSO which performs hierarchical clustering of variables based on missingness followed by sequential Group LASSO within clusters. Variables are pre-filtered for missingness and balance between training and target sets with final models built using stepwise inclusion of features ranked by completeness. This research has been conducted using the UK Biobank ( > 500 k) to predict unmeasured Alcohol Use Disorders Identification Test (AUDIT) scores. The phenotypic correlation between measured and predicted total score was 0.67 while genetic correlations between independent subjects was high >0.86. Phenotypic and genetic correlations in real data application, as well as simulations, demonstrate the method has significant accuracy and utility for increasing power for genetic loci discovery.

摘要

将基因数据、丰富的表型和生物学指标相联系的大规模生物样本库为科学发现提供了强大机遇。然而,实际收集的数据常常存在大量缺失值。虽然缺失数据预测是可行的,但许多生物样本库中固有的分块缺失值会显著降低预测性能。为解决这一问题,我们开发了缺失值适应性分组知情聚类(MAGIC)-套索算法,该算法首先基于缺失值对变量进行层次聚类,然后在聚类内进行顺序分组套索。变量会根据缺失值进行预筛选,并在训练集和目标集之间进行平衡,最终模型通过逐步纳入按完整性排序的特征来构建。本研究使用英国生物样本库(超过50万样本)来预测未测量的酒精使用障碍识别测试(AUDIT)分数。测量得分与预测总分之间的表型相关性为0.67,而独立受试者之间的遗传相关性较高,大于0.86。实际数据应用以及模拟中的表型和遗传相关性表明,该方法在提高基因位点发现能力方面具有显著的准确性和实用性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e5a/10399453/217b600d8090/fgene-14-1162690-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验