• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

randPedPCA:从大型家系中快速近似主成分

randPedPCA: rapid approximation of principal components from large pedigrees.

作者信息

Lee Hanbin, Craddock Rosalind Françoise, Gorjanc Gregor, Becher Hannes

机构信息

Department of Statistics, University of Michigan, Ann Arbor, MI, 48109, USA.

The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Midlothian, EH25 9RG, UK.

出版信息

Genet Sel Evol. 2025 Aug 28;57(1):46. doi: 10.1186/s12711-025-00994-y.

DOI:10.1186/s12711-025-00994-y
PMID:40877802
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12392600/
Abstract

BACKGROUND

Pedigrees continue to be extremely important in agriculture and conservation genetics, with the pedigrees of modern breeding programmes easily comprising millions of records. This size can make visualising the structure of such pedigrees challenging. Being graphs, pedigrees can be represented as matrices, including, most commonly, the additive (numerator) relationship matrix, , and its inverse. With these matrices, the structure of pedigrees can then, in principle, be visualised via principal component analysis (PCA). However, the naive PCA of matrices for large pedigrees is challenging due to computational and memory constraints. Furthermore, computing a few leading principal components is usually sufficient for visualising the structure of a pedigree.

RESULTS

We present the open-access R package randPedPCA for rapid pedigree PCA using sparse matrices. Our rapid pedigree PCA builds on the fact that matrix-vector multiplications with the additive relationship matrix can be carried out implicitly using the extremely sparse inverse relationship factor, , which can be directly obtained from a given pedigree. We implemented two methods. Randomised singular value decomposition tends to be faster when very few principal components are requested, and Eigen decomposition via the RSpectra library tends to be faster when more principal components are of interest. On simulated data, our package delivers a speed-up greater than 10,000 times compared to naive PCA. It further enables analyses that are impossible with naive PCA. When only two principal components are desired, the randomised PCA method can half the running time required compared to RSpectra, which we demonstrate by analysing the pedigree of the UK Kennel Club registered Labrador Retriever population of almost 1.5 million individuals.

CONCLUSIONS

The leading principal components of pedigree matrices can be efficiently obtained using randomised singular value decomposition and other methods. Scatter plots of these scores allow for intuitive visualisation of large pedigrees. For large pedigrees, this is considerably faster than rendering plots of a pedigree graph.

摘要

背景

系谱在农业和保护遗传学中仍然极其重要,现代育种计划的系谱很容易包含数百万条记录。如此庞大的规模使得可视化此类系谱的结构具有挑战性。作为图,系谱可以表示为矩阵,最常见的是加性(分子)关系矩阵及其逆矩阵。利用这些矩阵,原则上可以通过主成分分析(PCA)来可视化系谱的结构。然而,由于计算和内存限制,对大型系谱矩阵进行简单的PCA具有挑战性。此外,计算几个主要主成分通常足以可视化系谱的结构。

结果

我们展示了用于使用稀疏矩阵进行快速系谱PCA的开放获取R包randPedPCA。我们的快速系谱PCA基于这样一个事实,即与加性关系矩阵的矩阵向量乘法可以使用极其稀疏的逆关系因子隐式地进行,该因子可以直接从给定的系谱中获得。我们实现了两种方法。当只需要很少的主成分时,随机奇异值分解往往更快,而当需要更多主成分时,通过RSpectra库进行特征值分解往往更快。在模拟数据上,与简单PCA相比,我们的包速度提升超过10000倍。它还能够进行简单PCA无法完成的分析。当只需要两个主成分时,随机PCA方法所需的运行时间与RSpectra相比可以减半,我们通过分析英国养犬俱乐部注册的近150万只拉布拉多猎犬种群的系谱来证明这一点。

结论

可以使用随机奇异值分解和其他方法有效地获得系谱矩阵的主要主成分。这些得分的散点图允许直观地可视化大型系谱。对于大型系谱,这比绘制系谱图要快得多。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f057/12392600/36bece4368f5/12711_2025_994_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f057/12392600/aba2aff5581a/12711_2025_994_Figa_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f057/12392600/949fb96fb30d/12711_2025_994_Figb_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f057/12392600/02ac8a05439a/12711_2025_994_Figc_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f057/12392600/b33a0b6ff519/12711_2025_994_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f057/12392600/2154ef682058/12711_2025_994_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f057/12392600/36bece4368f5/12711_2025_994_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f057/12392600/aba2aff5581a/12711_2025_994_Figa_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f057/12392600/949fb96fb30d/12711_2025_994_Figb_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f057/12392600/02ac8a05439a/12711_2025_994_Figc_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f057/12392600/b33a0b6ff519/12711_2025_994_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f057/12392600/2154ef682058/12711_2025_994_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f057/12392600/36bece4368f5/12711_2025_994_Fig3_HTML.jpg

相似文献

1
randPedPCA: rapid approximation of principal components from large pedigrees.randPedPCA:从大型家系中快速近似主成分
Genet Sel Evol. 2025 Aug 28;57(1):46. doi: 10.1186/s12711-025-00994-y.
2
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
3
Sexual Harassment and Prevention Training性骚扰与预防培训
4
Electrophoresis电泳
5
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.系统性药理学治疗慢性斑块状银屑病:网络荟萃分析。
Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.
6
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.慢性斑块状银屑病的全身药理学治疗:一项网状Meta分析。
Cochrane Database Syst Rev. 2020 Jan 9;1(1):CD011535. doi: 10.1002/14651858.CD011535.pub3.
7
Immunogenicity and seroefficacy of pneumococcal conjugate vaccines: a systematic review and network meta-analysis.肺炎球菌结合疫苗的免疫原性和血清效力:系统评价和网络荟萃分析。
Health Technol Assess. 2024 Jul;28(34):1-109. doi: 10.3310/YWHA3079.
8
Short-Term Memory Impairment短期记忆障碍
9
Nivolumab for adults with Hodgkin's lymphoma (a rapid review using the software RobotReviewer).纳武单抗用于成人霍奇金淋巴瘤(使用RobotReviewer软件进行的快速综述)
Cochrane Database Syst Rev. 2018 Jul 12;7(7):CD012556. doi: 10.1002/14651858.CD012556.pub2.
10
Anterior Approach Total Ankle Arthroplasty with Patient-Specific Cut Guides.使用患者特异性截骨导向器的前路全踝关节置换术。
JBJS Essent Surg Tech. 2025 Aug 15;15(3). doi: 10.2106/JBJS.ST.23.00027. eCollection 2025 Jul-Sep.

本文引用的文献

1
On the ability of the LR method to detect bias when there is pedigree misspecification and lack of connectedness.当存在家系误判和不连通时,LR 方法检测偏差的能力。
Genet Sel Evol. 2024 Nov 21;56(1):74. doi: 10.1186/s12711-024-00943-1.
2
Definition of metafounders based on population structure analysis.基于群体结构分析的元发现者定义。
Genet Sel Evol. 2024 Jun 6;56(1):43. doi: 10.1186/s12711-024-00913-7.
3
Redefining and interpreting genomic relationships of metafounders.重新定义和解释元奠基者的基因组关系。
Genet Sel Evol. 2024 May 2;56(1):34. doi: 10.1186/s12711-024-00891-w.
4
Estimating genomic relationships of metafounders across and within breeds using maximum likelihood, pseudo-expectation-maximization maximum likelihood and increase of relationships.使用最大似然法、拟期望极大化最大似然法和关系增量法估计跨品种和品种内元育种值的基因组关系。
Genet Sel Evol. 2024 May 2;56(1):35. doi: 10.1186/s12711-024-00892-9.
5
Fast and accurate out-of-core PCA framework for large scale biobank data.用于大规模生物库数据的快速准确的核外 PCA 框架。
Genome Res. 2023 Sep;33(9):1599-1608. doi: 10.1101/gr.277525.122. Epub 2023 Aug 24.
6
Effect of subdivision of the Lacaune dairy sheep breed on the accuracy of genomic prediction.拉卡奴奶绵羊品种细分对基因组预测准确性的影响。
J Dairy Sci. 2023 Aug;106(8):5570-5581. doi: 10.3168/jds.2022-23114. Epub 2023 Jun 20.
7
Nonparallel genome changes within subpopulations over time contributed to genetic diversity within the US Holstein population.随着时间的推移,亚群内的非平行基因组变化导致了美国荷斯坦牛群体内的遗传多样性。
J Dairy Sci. 2023 Apr;106(4):2551-2572. doi: 10.3168/jds.2022-21914. Epub 2023 Feb 14.
8
Optimisation of the core subset for the APY approximation of genomic relationships.优化核心子集,以实现基因组关系的 APY 逼近。
Genet Sel Evol. 2022 Nov 22;54(1):76. doi: 10.1186/s12711-022-00767-x.
9
Identifying influential sires and distinct clusters of selection candidates based on genomic relationships to reduce inbreeding in the US Holstein.基于基因组关系识别有影响力的种公牛和不同的选择候选群体,以减少美国荷斯坦牛的近交。
J Dairy Sci. 2022 Nov;105(12):9810-9821. doi: 10.3168/jds.2022-22143. Epub 2022 Oct 12.
10
A geometric relationship of , and -statistics with principal component analysis.一种关于、和统计量与主成分分析的几何关系。 (注:原文中“ and -statistics”表述不太完整准确,可能存在信息缺失)
Philos Trans R Soc Lond B Biol Sci. 2022 Jun 6;377(1852):20200413. doi: 10.1098/rstb.2020.0413. Epub 2022 Apr 18.