对数以百万计个体进行快速准确的多表型推算。

Rapid and accurate multi-phenotype imputation for millions of individuals.

作者信息

Gu Lin-Lin, Wu Hong-Shan, Liu Tian-Yi, Zhang Yong-Jie, He Jing-Cheng, Liu Xiao-Lei, Wang Zhi-Yong, Chen Guo-Bo, Jiang Dan, Fang Ming

机构信息

Key Laboratory of Healthy Mariculture for the East China Sea, Ministry of Agriculture and Rural Affairs & Fisheries college, Jimei University, Xiamen, Fujian, People's Republic of China.

Center for Data Science, School of Mathematical Sciences, Zhejiang University, Hangzhou, Zhejiang, People's Republic of China.

出版信息

Nat Commun. 2025 Jan 4;16(1):387. doi: 10.1038/s41467-024-55496-0.

DOI:10.1038/s41467-024-55496-0

PMID:39755672

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11700122/

Abstract

Deep phenotyping can enhance the power of genetic analysis, including genome-wide association studies (GWAS), but the occurrence of missing phenotypes compromises the potential of such resources. Although many phenotypic imputation methods have been developed, the accurate imputation of millions of individuals remains challenging. In the present study, we have developed a multi-phenotype imputation method based on mixed fast random forest (PIXANT) by leveraging efficient machine learning (ML)-based algorithms. We demonstrate by extensive simulations that PIXANT is reliable, robust and highly resource-efficient. We then apply PIXANT to the UKB data of 277,301 unrelated White British citizens and 425 traits, and GWAS is subsequently performed on the imputed phenotypes, 18.4% more GWAS loci are identified than before imputation (8710 vs 7355). The increased statistical power of GWAS identified some additional candidate genes affecting heart rate, such as RNF220, SCN10A, and RGS6, suggesting that the use of imputed phenotype data from a large cohort may lead to the discovery of additional candidate genes for complex traits.

摘要

深度表型分析可以增强基因分析的效能，包括全基因组关联研究（GWAS），但缺失表型的出现会损害此类资源的潜力。尽管已经开发了许多表型插补方法，但对数百万个体进行准确插补仍然具有挑战性。在本研究中，我们通过利用基于高效机器学习（ML）的算法，开发了一种基于混合快速随机森林的多表型插补方法（PIXANT）。我们通过大量模拟证明，PIXANT是可靠、稳健且资源高效的。然后，我们将PIXANT应用于277,301名不相关的英国白人公民的英国生物银行（UKB）数据和425个性状，并随后对插补后的表型进行GWAS分析，与插补前相比，多识别出了18.4%的GWAS位点（8710个对7355个）。GWAS统计效能的提高识别出了一些影响心率的额外候选基因，如RNF220、SCN10A和RGS6，这表明使用来自大型队列的插补表型数据可能会发现复杂性状的额外候选基因。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a2f/11700122/8690e50d3b3b/41467_2024_55496_Fig1_HTML.jpg

相似文献

Rapid and accurate multi-phenotype imputation for millions of individuals.对数以百万计个体进行快速准确的多表型推算。

Nat Commun. 2025 Jan 4;16(1):387. doi: 10.1038/s41467-024-55496-0.

Exploration of haplotype research consortium imputation for genome-wide association studies in 20,032 Generation Scotland participants.对20,032名苏格兰世代研究参与者进行全基因组关联研究的单倍型研究联盟归因分析探索。

Genome Med. 2017 Mar 7;9(1):23. doi: 10.1186/s13073-017-0414-4.

Enhancing nonlinear transcriptome- and proteome-wide association studies via trait imputation with applications to Alzheimer's disease.通过性状插补增强全转录组和全蛋白质组的非线性关联研究及其在阿尔茨海默病中的应用

PLoS Genet. 2025 Apr 10;21(4):e1011659. doi: 10.1371/journal.pgen.1011659. eCollection 2025 Apr.

Proper conditional analysis in the presence of missing data: Application to large scale meta-analysis of tobacco use phenotypes.在存在缺失数据的情况下进行适当的条件分析：在大规模烟草使用表型的荟萃分析中的应用。

PLoS Genet. 2018 Jul 17;14(7):e1007452. doi: 10.1371/journal.pgen.1007452. eCollection 2018 Jul.

Leveraging genome-wide association analyses with chip and imputed data emerges potential pleiotropic region for four duck growth traits.利用芯片和推算数据进行全基因组关联分析，发现了四个鸭生长性状的潜在多效性区域。

Sci Rep. 2025 Jul 2;15(1):23625. doi: 10.1038/s41598-025-08852-z.

SCOPA and META-SCOPA: software for the analysis and aggregation of genome-wide association studies of multiple correlated phenotypes.SCOPA和META-SCOPA：用于分析和汇总多个相关表型的全基因组关联研究的软件。

BMC Bioinformatics. 2017 Jan 11;18(1):25. doi: 10.1186/s12859-016-1437-3.

Short-Term Memory Impairment短期记忆障碍

Does the Presence of Missing Data Affect the Performance of the SORG Machine-learning Algorithm for Patients With Spinal Metastasis? Development of an Internet Application Algorithm.缺失数据的存在是否会影响 SORG 机器学习算法在脊柱转移瘤患者中的性能？开发一种互联网应用算法。

Clin Orthop Relat Res. 2024 Jan 1;482(1):143-157. doi: 10.1097/CORR.0000000000002706. Epub 2023 Jun 12.

Heritability estimates and genome-wide association study of methane emission traits in Nellore cattle.内罗尔牛甲烷排放性状的遗传力估计和全基因组关联研究。

J Anim Sci. 2024 Jan 3;102. doi: 10.1093/jas/skae182.

Approaches for predicting dairy cattle methane emissions: from traditional methods to machine learning.预测奶牛甲烷排放的方法：从传统方法到机器学习。

J Anim Sci. 2024 Jan 3;102. doi: 10.1093/jas/skae219.

引用本文的文献

Genome-wide associations spanning 194 in-hospital drug dosage change phenotypes highlight diverse genetic backgrounds in concurrent drug therapy.涵盖194种住院期间药物剂量变化表型的全基因组关联研究凸显了联合药物治疗中不同的遗传背景。

Comput Struct Biotechnol J. 2025 Jun 25;28:239-248. doi: 10.1016/j.csbj.2025.06.042. eCollection 2025.

GPS: Harnessing data fusion strategies to improve the accuracy of machine learning-based genomic and phenotypic selection.GPS：利用数据融合策略提高基于机器学习的基因组和表型选择的准确性。

Plant Commun. 2025 Aug 11;6(8):101416. doi: 10.1016/j.xplc.2025.101416. Epub 2025 Jun 11.

本文引用的文献

A saturated map of common genetic variants associated with human height.与人类身高相关的常见遗传变异的饱和图谱。

Nature. 2022 Oct;610(7933):704-712. doi: 10.1038/s41586-022-05275-y. Epub 2022 Oct 12.

Integrating transcriptomics, metabolomics, and GWAS helps reveal molecular mechanisms for metabolite levels and disease risk.整合转录组学、代谢组学和 GWAS 有助于揭示代谢物水平和疾病风险的分子机制。

Am J Hum Genet. 2022 Oct 6;109(10):1727-1741. doi: 10.1016/j.ajhg.2022.08.007. Epub 2022 Sep 1.

Single-nucleus cross-tissue molecular reference maps toward understanding disease gene function.单细胞跨组织分子参考图谱，助力疾病基因功能研究。

Science. 2022 May 13;376(6594):eabl4290. doi: 10.1126/science.abl4290.

missForest with feature selection using binary particle swarm optimization improves the imputation accuracy of continuous data.使用二进制粒子群优化进行特征选择的 missForest 提高了连续数据的插补准确性。

Genes Genomics. 2022 Jun;44(6):651-658. doi: 10.1007/s13258-022-01247-8. Epub 2022 Apr 6.

Evidence-Based Assessment of Genes in Dilated Cardiomyopathy.基于证据的扩张型心肌病相关基因评估。

Circulation. 2021 Jul 6;144(1):7-19. doi: 10.1161/CIRCULATIONAHA.120.053033. Epub 2021 May 5.

Efficient toolkit implementing best practices for principal component analysis of population genetic data.高效工具包，实现了群体遗传数据主成分分析的最佳实践。

Bioinformatics. 2020 Aug 15;36(16):4449-4457. doi: 10.1093/bioinformatics/btaa520.

Reevaluating the Genetic Contribution of Monogenic Dilated Cardiomyopathy.重新评估单基因扩张型心肌病的遗传贡献。

Circulation. 2020 Feb 4;141(5):387-398. doi: 10.1161/CIRCULATIONAHA.119.037661. Epub 2020 Jan 27.

Multitrait analysis of glaucoma identifies new risk loci and enables polygenic prediction of disease susceptibility and progression.多基因分析青光眼确定新的风险位点，并能够多基因预测疾病易感性和进展。

Nat Genet. 2020 Feb;52(2):160-166. doi: 10.1038/s41588-019-0556-y. Epub 2020 Jan 20.

Cadherin-11 blockade reduces inflammation-driven fibrotic remodeling and improves outcomes after myocardial infarction.钙黏蛋白 11 阻断减少炎症驱动的纤维化重塑，改善心肌梗死后的结局。

JCI Insight. 2019 Sep 19;4(18):131545. doi: 10.1172/jci.insight.131545.

Dilated cardiomyopathy and arrhythmogenic left ventricular cardiomyopathy: a comprehensive genotype-imaging phenotype study.扩张型心肌病和致心律失常性左室心肌病：一项综合的基因型-影像表型研究。

Eur Heart J Cardiovasc Imaging. 2020 Mar 1;21(3):326-336. doi: 10.1093/ehjci/jez188.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

对数以百万计个体进行快速准确的多表型推算。

Rapid and accurate multi-phenotype imputation for millions of individuals.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献