基于亲缘关系的多核学习进行基因组预测,能够对表型性状的潜在遗传机制提出假设。
Genomic prediction with kinship-based multiple kernel learning produces hypothesis on the underlying inheritance mechanisms of phenotypic traits.
作者信息
Raimondi Daniele, Verplaetse Nora, Passemiers Antoine, Jans Deborah Sarah, Cleynen Isabelle, Moreau Yves
机构信息
Institut de Génétique Moléculaire de Montpellier (IGMM), CNRS-UMR5535, Université de Montpellier, Montpellier, 34293, France.
ESAT-STADIUS, KU Leuven, Leuven, 3001, Belgium.
出版信息
Genome Biol. 2025 Apr 4;26(1):84. doi: 10.1186/s13059-025-03544-3.
BACKGROUND
Genomic prediction encompasses the techniques used in agricultural technology to predict the genetic merit of individuals towards valuable phenotypic traits. It is related to Genome Interpretation in humans, which models the individual risk of developing disease traits. Genomic prediction is dominated by linear mixed models, such as the Genomic Best Linear Unbiased Prediction (GBLUP), which computes kinship matrices from SNP array data, while Genome Interpretation applications to clinical genetics rely mainly on Polygenic Risk Scores.
RESULTS
In this article, we exploit the positive semidefinite characteristics of the kinship matrices that are conventionally used in GBLUP to propose a novel Genomic Multiple Kernel Learning method (GMKL), in which the multiple kinship matrices corresponding to Additive, Dominant, and Epistatic Inheritance Mechanisms are used as kernels in support vector machines, and we apply it to both worlds. We benchmark GMKL on simulated cattle phenotypes, showing that it outperforms the classical GBLUP predictors for genomic prediction. Moreover, we show that GMKL ranks the kinship kernels representing different inheritance mechanisms according to their compatibility with the observed data, allowing it to produce hypotheses on the normally unknown inheritance mechanisms generating the target phenotypes. We then apply GMKL to the prediction of two inflammatory bowel disease cohorts with more than 6500 samples in total, consistently obtaining results suggesting that epistasis might have a relevant, although underestimated role in inflammatory bowel disease (IBD).
CONCLUSIONS
We show that GMKL performs similarly to GBLUP, but it can formulate biological hypotheses about inheritance mechanisms, such as suggesting that epistasis influences IBD.
背景
基因组预测涵盖了农业技术中用于预测个体在有价值表型性状方面遗传价值的技术。它与人类基因组解读相关,后者对个体患疾病性状的风险进行建模。基因组预测主要由线性混合模型主导,如基因组最佳线性无偏预测(GBLUP),它根据单核苷酸多态性(SNP)阵列数据计算亲缘关系矩阵,而临床遗传学中的基因组解读应用主要依赖多基因风险评分。
结果
在本文中,我们利用GBLUP中常规使用的亲缘关系矩阵的半正定特性,提出了一种新的基因组多核学习方法(GMKL),其中对应于加性、显性和上位性遗传机制的多个亲缘关系矩阵被用作支持向量机中的核,并且我们将其应用于这两个领域。我们在模拟的牛表型上对GMKL进行基准测试,表明它在基因组预测方面优于经典的GBLUP预测器。此外,我们表明GMKL根据亲缘关系核与观测数据的兼容性对代表不同遗传机制的亲缘关系核进行排序,使其能够对产生目标表型的通常未知的遗传机制提出假设。然后,我们将GMKL应用于对两个总共超过6500个样本的炎症性肠病队列的预测,一致获得的结果表明上位性可能在炎症性肠病(IBD)中具有相关作用,尽管该作用被低估了。
结论
我们表明GMKL的表现与GBLUP相似,但它可以对遗传机制提出生物学假设,例如表明上位性影响IBD。