Suppr超能文献

多变量方法分析仅部分重叠的汇集分子数据中的风险标志物。

A multivariable approach for risk markers from pooled molecular data with only partial overlap.

机构信息

Forest Research Institute Baden-Württemberg (FVA), Wonnhaldestraße 4, Freiburg, 79100, Germany.

Institute for Medical Biometry and Statistics, Faculty of Medicine and Medical Center - University of Freiburg, Stefan-Meier-Straße 26, Freiburg, 79104, Germany.

出版信息

BMC Med Genet. 2019 Jul 19;20(1):128. doi: 10.1186/s12881-019-0849-0.

Abstract

BACKGROUND

Increasingly, molecular measurements from multiple studies are pooled to identify risk scores, with only partial overlap of measurements available from different studies. Univariate analyses of such markers have routinely been performed in such settings using meta-analysis techniques in genome-wide association studies for identifying genetic risk scores. In contrast, multivariable techniques such as regularized regression, which might potentially be more powerful, are hampered by only partial overlap of available markers even when the pooling of individual level data is feasible for analysis. This cannot easily be addressed at a preprocessing level, as quality criteria in the different studies may result in differential availability of markers - even after imputation.

METHODS

Motivated by data from the InterLymph Consortium on risk factors for non-Hodgkin lymphoma, which exhibits these challenges, we adapted a regularized regression approach, componentwise boosting, for dealing with partial overlap in SNPs. This synthesis regression approach is combined with resampling to determine stable sets of single nucleotide polymorphisms, which could feed into a genetic risk score. The proposed approach is contrasted with univariate analyses, an application of the lasso, and with an analysis that discards studies causing the partial overlap. The question of statistical significance is faced with an approach called stability selection.

RESULTS

Using an excerpt of the data from the InterLymph Consortium on two specific subtypes of non-Hodgkin lymphoma, it is shown that componentwise boosting can take into account all applicable information from different SNPs, irrespective of whether they are covered by all investigated studies and for all individuals in the single studies. The results indicate increased power, even when studies that would be discarded in a complete case analysis only comprise a small proportion of individuals.

CONCLUSIONS

Given the observed gains in power, the proposed approach can be recommended more generally whenever there is only partial overlap of molecular measurements obtained from pooled studies and/or missing data in single studies. A corresponding software implementation is available upon request.

TRIAL REGISTRATION

All involved studies have provided signed GWAS data submission certifications to the U.S. National Institute of Health and have been retrospectively registered.

摘要

背景

越来越多的研究通过聚合分子测量结果来识别风险评分,而不同研究提供的测量结果只有部分重叠。在这种情况下,全基因组关联研究中的荟萃分析技术常用于对这些标志物进行单变量分析,以确定遗传风险评分。相比之下,即使可以对个体水平数据进行汇总分析,多变量技术(如正则化回归)也受到可用标记物部分重叠的限制,即使在这种情况下,即使在可用标记物部分重叠的情况下,也可能更有效。

方法

受国际非霍奇金淋巴瘤危险因素合作研究联盟(InterLymph Consortium)的数据的启发,该数据存在这些挑战,我们针对 SNP 的部分重叠,调整了正则化回归方法——组件提升(componentwise boosting),用于处理部分重叠的 SNP。这种综合回归方法与重采样相结合,确定了可纳入遗传风险评分的稳定单核苷酸多态性集。与单变量分析、lasso 应用以及丢弃导致部分重叠的研究的分析进行了对比。使用一种称为稳定性选择(stability selection)的方法来处理统计显著性问题。

结果

使用国际非霍奇金淋巴瘤危险因素合作研究联盟的部分数据,针对两种特定的非霍奇金淋巴瘤亚型进行分析,结果表明,组件提升可以考虑来自不同 SNP 的所有适用信息,而不论它们是否被所有研究涵盖,以及在单个研究中是否涵盖所有个体。结果表明,即使在完全病例分析中被丢弃的研究仅占个体的一小部分,该方法也能提高功效。

结论

鉴于观察到的功效增益,当从聚合研究中获得的分子测量结果以及/或单个研究中的缺失数据只有部分重叠时,建议更普遍地采用所提出的方法。根据要求可以提供相应的软件实现。

试验注册

所有涉及的研究都向美国国立卫生研究院提供了签署的全基因组关联研究数据提交认证,并已进行了回顾性注册。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0ef/6642584/b1949ad21488/12881_2019_849_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验