• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

个体水平数据与高维汇总统计量的综合分析。

Integrative analysis of individual-level data and high-dimensional summary statistics.

机构信息

Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD 20892, USA.

School of Statistics and Data Science, Nankai University, Tianjin 300071, China.

出版信息

Bioinformatics. 2023 Apr 3;39(4). doi: 10.1093/bioinformatics/btad156.

DOI:10.1093/bioinformatics/btad156
PMID:36964712
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10361352/
Abstract

MOTIVATION

Researchers usually conduct statistical analyses based on models built on raw data collected from individual participants (individual-level data). There is a growing interest in enhancing inference efficiency by incorporating aggregated summary information from other sources, such as summary statistics on genetic markers' marginal associations with a given trait generated from genome-wide association studies. However, combining high-dimensional summary data with individual-level data using existing integrative procedures can be challenging due to various numeric issues in optimizing an objective function over a large number of unknown parameters.

RESULTS

We develop a procedure to improve the fitting of a targeted statistical model by leveraging external summary data for more efficient statistical inference (both effect estimation and hypothesis testing). To make this procedure scalable to high-dimensional summary data, we propose a divide-and-conquer strategy by breaking the task into easier parallel jobs, each fitting the targeted model by integrating the individual-level data with a small proportion of summary data. We obtain the final estimates of model parameters by pooling results from multiple fitted models through the minimum distance estimation procedure. We improve the procedure for a general class of additive models commonly encountered in genetic studies. We further expand these two approaches to integrate individual-level and high-dimensional summary data from different study populations. We demonstrate the advantage of the proposed methods through simulations and an application to the study of the effect on pancreatic cancer risk by the polygenic risk score defined by BMI-associated genetic markers.

AVAILABILITY AND IMPLEMENTATION

R package is available at https://github.com/fushengstat/MetaGIM.

摘要

动机

研究人员通常基于从个体参与者(个体水平数据)收集的原始数据构建的模型进行统计分析。人们越来越感兴趣的是通过合并来自其他来源的聚合汇总信息来提高推断效率,例如来自全基因组关联研究的遗传标记与给定性状的边缘关联的汇总统计信息。然而,由于在优化大量未知参数的目标函数时存在各种数值问题,使用现有的综合程序将高维汇总数据与个体水平数据结合起来可能具有挑战性。

结果

我们开发了一种通过利用外部汇总数据来改进目标统计模型拟合的程序,以便更有效地进行统计推断(包括效果估计和假设检验)。为了使该程序能够扩展到高维汇总数据,我们提出了一种分而治之的策略,通过将任务分解为更简单的并行作业,每个作业通过将个体水平数据与一小部分汇总数据集成来拟合目标模型。我们通过最小距离估计程序从多个拟合模型的结果中汇集来获得模型参数的最终估计值。我们改进了用于遗传研究中常见的一般加法模型类的程序。我们进一步扩展了这两种方法,以整合来自不同研究人群的个体水平和高维汇总数据。我们通过模拟和应用于由 BMI 相关遗传标记定义的多基因风险评分对胰腺癌风险的影响的研究,展示了所提出方法的优势。

可用性和实现

R 包可在 https://github.com/fushengstat/MetaGIM 上获得。

相似文献

1
Integrative analysis of individual-level data and high-dimensional summary statistics.个体水平数据与高维汇总统计量的综合分析。
Bioinformatics. 2023 Apr 3;39(4). doi: 10.1093/bioinformatics/btad156.
2
Integrate multiple traits to detect novel trait-gene association using GWAS summary data with an adaptive test approach.利用 GWAS 汇总数据和自适应检验方法整合多种性状,以检测新的性状-基因关联。
Bioinformatics. 2019 Jul 1;35(13):2251-2257. doi: 10.1093/bioinformatics/bty961.
3
IGESS: a statistical approach to integrating individual-level genotype data and summary statistics in genome-wide association studies.IGESS:一种在全基因组关联研究中整合个体水平基因型数据和汇总统计数据的统计方法。
Bioinformatics. 2017 Sep 15;33(18):2882-2889. doi: 10.1093/bioinformatics/btx314.
4
Joint analysis of individual-level and summary-level GWAS data by leveraging pleiotropy.利用多效性对个体水平和汇总水平 GWAS 数据进行联合分析。
Bioinformatics. 2019 May 15;35(10):1729-1736. doi: 10.1093/bioinformatics/bty870.
5
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
6
CoMM-S2: a collaborative mixed model using summary statistics in transcriptome-wide association studies.CoMM-S2:一种基于转录组关联研究汇总统计信息的协作混合模型。
Bioinformatics. 2020 Apr 1;36(7):2009-2016. doi: 10.1093/bioinformatics/btz880.
7
Powerful and efficient SNP-set association tests across multiple phenotypes using GWAS summary data.利用 GWAS 汇总数据对多种表型进行强大且高效的 SNP 集关联测试。
Bioinformatics. 2019 Apr 15;35(8):1366-1372. doi: 10.1093/bioinformatics/bty811.
8
Robust genetic model-based SNP-set association test using CauchyGM.使用柯西广义线性模型(CauchyGM)进行稳健的基于遗传模型的单核苷酸多态性集关联测试。
Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac728.
9
Truncated tests for combining evidence of summary statistics.汇总统计量合并证据的截断检验。
Genet Epidemiol. 2020 Oct;44(7):687-701. doi: 10.1002/gepi.22330. Epub 2020 Jun 24.
10
An Adaptive Association Test for Multiple Phenotypes with GWAS Summary Statistics.一种利用全基因组关联研究汇总统计数据对多种表型进行的适应性关联测试。
Genet Epidemiol. 2015 Dec;39(8):651-63. doi: 10.1002/gepi.21931. Epub 2015 Oct 22.

引用本文的文献

1
A comparison of some existing and novel methods for integrating historical models to improve estimation of coefficients in logistic regression.一些现有和新颖方法在整合历史模型以改进逻辑回归系数估计方面的比较。
J R Stat Soc Ser A Stat Soc. 2024 Sep 24;188(1):46-67. doi: 10.1093/jrsssa/qnae093. eCollection 2025 Jan.
2
The goldmine of GWAS summary statistics: a systematic review of methods and tools.全基因组关联研究汇总统计数据的宝库:方法与工具的系统综述
BioData Min. 2024 Sep 5;17(1):31. doi: 10.1186/s13040-024-00385-x.
3
Improve the model of disease subtype heterogeneity by leveraging external summary data.利用外部汇总数据改善疾病亚型异质性模型。
PLoS Comput Biol. 2023 Jul 12;19(7):e1011236. doi: 10.1371/journal.pcbi.1011236. eCollection 2023 Jul.

本文引用的文献

1
Synthesizing external aggregated information in the penalized Cox regression under population heterogeneity.在人群异质性下的惩罚 Cox 回归中综合外部聚合信息。
Stat Med. 2021 Oct 15;40(23):4915-4930. doi: 10.1002/sim.9101. Epub 2021 Jun 16.
2
Integrative analysis of multiple case-control studies.多病例对照研究的综合分析。
Biometrics. 2022 Sep;78(3):1080-1091. doi: 10.1111/biom.13461. Epub 2021 Apr 19.
3
A unified approach for synthesizing population-level covariate effect information in semiparametric estimation with survival data.一种在生存数据的半参数估计中综合总体水平协变量效应信息的统一方法。
Stat Med. 2020 May 15;39(10):1573-1590. doi: 10.1002/sim.8499. Epub 2020 Feb 19.
4
Improved polygenic prediction by Bayesian multiple regression on summary statistics.基于汇总统计数据的贝叶斯多元回归提高多基因预测能力。
Nat Commun. 2019 Nov 8;10(1):5086. doi: 10.1038/s41467-019-12653-0.
5
Generalized meta-analysis for multiple regression models across studies with disparate covariate information.针对具有不同协变量信息的多项研究的多元回归模型进行广义荟萃分析。
Biometrika. 2019 Sep;106(3):567-585. doi: 10.1093/biomet/asz030. Epub 2019 Jul 13.
6
Informing a Risk Prediction Model for Binary Outcomes with External Coefficient Information.利用外部系数信息构建二元结局的风险预测模型。
J R Stat Soc Ser C Appl Stat. 2019 Jan;68(1):121-139. doi: 10.1111/rssc.12306. Epub 2018 Aug 13.
7
A statistical framework for cross-tissue transcriptome-wide association analysis.跨组织转录组全基因组关联分析的统计框架。
Nat Genet. 2019 Mar;51(3):568-576. doi: 10.1038/s41588-019-0345-7. Epub 2019 Feb 25.
8
The UK Biobank resource with deep phenotyping and genomic data.英国生物银行资源库,具有深度表型和基因组数据。
Nature. 2018 Oct;562(7726):203-209. doi: 10.1038/s41586-018-0579-z. Epub 2018 Oct 10.
9
Meta-analysis of genome-wide association studies for height and body mass index in ∼700000 individuals of European ancestry.全基因组关联研究荟萃分析:约 70 万欧洲血统个体的身高和体重指数。
Hum Mol Genet. 2018 Oct 15;27(20):3641-3649. doi: 10.1093/hmg/ddy271.
10
Improving estimation and prediction in linear regression incorporating external information from an established reduced model.将已建立的简化模型的外部信息纳入线性回归,以提高估计和预测。
Stat Med. 2018 Apr 30;37(9):1515-1530. doi: 10.1002/sim.7600. Epub 2018 Jan 24.