Suppr超能文献

基于提升目标损失函数的大型队列研究中的遗传预测建模。

Genetic Prediction Modeling in Large Cohort Studies via Boosting Targeted Loss Functions.

机构信息

Institute of Medical Biometry, Informatics and Epidemiology, Medical Faculty, University of Bonn, Bonn, Germany.

Institute of Genomic Statistics and Bioinformatics, Medical Faculty, University of Bonn, Bonn, Germany.

出版信息

Stat Med. 2024 Dec 10;43(28):5412-5430. doi: 10.1002/sim.10249. Epub 2024 Oct 23.

Abstract

Polygenic risk scores (PRS) aim to predict a trait from genetic information, relying on common genetic variants with low to medium effect sizes. As genotype data are high-dimensional in nature, it is crucial to develop methods that can be applied to large-scale data (large and large ). Many PRS tools aggregate univariate summary statistics from genome-wide association studies into a single score. Recent advancements allow simultaneous modeling of variant effects from individual-level genotype data. In this context, we introduced snpboost, an algorithm that applies statistical boosting on individual-level genotype data to estimate PRS via multivariable regression models. By processing variants iteratively in batches, snpboost can deal with large-scale cohort data. Having solved the technical obstacles due to data dimensionality, the methodological scope can now be broadened-focusing on key objectives for the clinical application of PRS. Similar to most methods in this context, snpboost has, so far, been restricted to quantitative and binary traits. Now, we incorporate more advanced alternatives-targeted to the particular aim and outcome. Adapting the loss function extends the snpboost framework to further data situations such as time-to-event and count data. Furthermore, alternative loss functions for continuous outcomes allow us to focus not only on the mean of the conditional distribution but also on other aspects that may be more helpful in the risk stratification of individual patients and can quantify prediction uncertainty, for example, median or quantile regression. This work enhances PRS fitting across multiple model classes previously unfeasible for this data type.

摘要

多基因风险评分(PRS)旨在从遗传信息预测性状,依赖于具有低至中等效应大小的常见遗传变异。由于基因型数据本质上是高维的,因此开发可应用于大规模数据(大数据量和大数据量)的方法至关重要。许多 PRS 工具将全基因组关联研究的单变量汇总统计信息聚合到单个评分中。最近的进展允许从个体水平基因型数据同时建模变体效应。在这种情况下,我们引入了 snpboost,这是一种在个体水平基因型数据上应用统计提升的算法,通过多变量回归模型来估计 PRS。通过迭代地以批次处理变体,snpboost 可以处理大规模队列数据。解决了由于数据维度引起的技术障碍后,现在可以拓宽方法范围-专注于 PRS 临床应用的关键目标。与该背景下的大多数方法一样,snpboost 迄今为止仅限于定量和二分类性状。现在,我们纳入了更先进的替代方案-针对特定目标和结果。适应损失函数将 snpboost 框架扩展到进一步的数据情况,例如生存时间和计数数据。此外,连续结果的替代损失函数允许我们不仅关注条件分布的均值,还可以关注其他可能对个体患者风险分层更有帮助的方面,并量化预测不确定性,例如中位数或分位数回归。这项工作增强了跨多个模型类的 PRS 拟合,而以前这种数据类型是无法实现的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/44d7/11586906/fe3f69542e3b/SIM-43-5412-g008.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验