Gu Lin-Lin, Wu Hong-Shan, Liu Tian-Yi, Zhang Yong-Jie, He Jing-Cheng, Liu Xiao-Lei, Wang Zhi-Yong, Chen Guo-Bo, Jiang Dan, Fang Ming
Key Laboratory of Healthy Mariculture for the East China Sea, Ministry of Agriculture and Rural Affairs & Fisheries college, Jimei University, Xiamen, Fujian, People's Republic of China.
Center for Data Science, School of Mathematical Sciences, Zhejiang University, Hangzhou, Zhejiang, People's Republic of China.
Nat Commun. 2025 Jan 4;16(1):387. doi: 10.1038/s41467-024-55496-0.
Deep phenotyping can enhance the power of genetic analysis, including genome-wide association studies (GWAS), but the occurrence of missing phenotypes compromises the potential of such resources. Although many phenotypic imputation methods have been developed, the accurate imputation of millions of individuals remains challenging. In the present study, we have developed a multi-phenotype imputation method based on mixed fast random forest (PIXANT) by leveraging efficient machine learning (ML)-based algorithms. We demonstrate by extensive simulations that PIXANT is reliable, robust and highly resource-efficient. We then apply PIXANT to the UKB data of 277,301 unrelated White British citizens and 425 traits, and GWAS is subsequently performed on the imputed phenotypes, 18.4% more GWAS loci are identified than before imputation (8710 vs 7355). The increased statistical power of GWAS identified some additional candidate genes affecting heart rate, such as RNF220, SCN10A, and RGS6, suggesting that the use of imputed phenotype data from a large cohort may lead to the discovery of additional candidate genes for complex traits.
深度表型分析可以增强基因分析的效能,包括全基因组关联研究(GWAS),但缺失表型的出现会损害此类资源的潜力。尽管已经开发了许多表型插补方法,但对数百万个体进行准确插补仍然具有挑战性。在本研究中,我们通过利用基于高效机器学习(ML)的算法,开发了一种基于混合快速随机森林的多表型插补方法(PIXANT)。我们通过大量模拟证明,PIXANT是可靠、稳健且资源高效的。然后,我们将PIXANT应用于277,301名不相关的英国白人公民的英国生物银行(UKB)数据和425个性状,并随后对插补后的表型进行GWAS分析,与插补前相比,多识别出了18.4%的GWAS位点(8710个对7355个)。GWAS统计效能的提高识别出了一些影响心率的额外候选基因,如RNF220、SCN10A和RGS6,这表明使用来自大型队列的插补表型数据可能会发现复杂性状的额外候选基因。