Suppr超能文献

对数以百万计个体进行快速准确的多表型推算。

Rapid and accurate multi-phenotype imputation for millions of individuals.

作者信息

Gu Lin-Lin, Wu Hong-Shan, Liu Tian-Yi, Zhang Yong-Jie, He Jing-Cheng, Liu Xiao-Lei, Wang Zhi-Yong, Chen Guo-Bo, Jiang Dan, Fang Ming

机构信息

Key Laboratory of Healthy Mariculture for the East China Sea, Ministry of Agriculture and Rural Affairs & Fisheries college, Jimei University, Xiamen, Fujian, People's Republic of China.

Center for Data Science, School of Mathematical Sciences, Zhejiang University, Hangzhou, Zhejiang, People's Republic of China.

出版信息

Nat Commun. 2025 Jan 4;16(1):387. doi: 10.1038/s41467-024-55496-0.

Abstract

Deep phenotyping can enhance the power of genetic analysis, including genome-wide association studies (GWAS), but the occurrence of missing phenotypes compromises the potential of such resources. Although many phenotypic imputation methods have been developed, the accurate imputation of millions of individuals remains challenging. In the present study, we have developed a multi-phenotype imputation method based on mixed fast random forest (PIXANT) by leveraging efficient machine learning (ML)-based algorithms. We demonstrate by extensive simulations that PIXANT is reliable, robust and highly resource-efficient. We then apply PIXANT to the UKB data of 277,301 unrelated White British citizens and 425 traits, and GWAS is subsequently performed on the imputed phenotypes, 18.4% more GWAS loci are identified than before imputation (8710 vs 7355). The increased statistical power of GWAS identified some additional candidate genes affecting heart rate, such as RNF220, SCN10A, and RGS6, suggesting that the use of imputed phenotype data from a large cohort may lead to the discovery of additional candidate genes for complex traits.

摘要

深度表型分析可以增强基因分析的效能,包括全基因组关联研究(GWAS),但缺失表型的出现会损害此类资源的潜力。尽管已经开发了许多表型插补方法,但对数百万个体进行准确插补仍然具有挑战性。在本研究中,我们通过利用基于高效机器学习(ML)的算法,开发了一种基于混合快速随机森林的多表型插补方法(PIXANT)。我们通过大量模拟证明,PIXANT是可靠、稳健且资源高效的。然后,我们将PIXANT应用于277,301名不相关的英国白人公民的英国生物银行(UKB)数据和425个性状,并随后对插补后的表型进行GWAS分析,与插补前相比,多识别出了18.4%的GWAS位点(8710个对7355个)。GWAS统计效能的提高识别出了一些影响心率的额外候选基因,如RNF220、SCN10A和RGS6,这表明使用来自大型队列的插补表型数据可能会发现复杂性状的额外候选基因。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a2f/11700122/8690e50d3b3b/41467_2024_55496_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验