Suppr超能文献

比较用于从基因变异预测甲基化的特征选择和机器学习方法。

Comparing feature selection and machine learning approaches for predicting methylation from genetic variation.

作者信息

Fong Wei Jing, Tan Hong Ming, Garg Rishabh, Teh Ai Ling, Pan Hong, Gupta Varsha, Krishna Bernadus, Chen Zou Hui, Purwanto Natania Yovela, Yap Fabian, Tan Kok Hian, Chan Kok Yen Jerry, Chan Shiao-Yng, Goh Nicole, Rane Nikita, Tan Ethel Siew Ee, Jiang Yuheng, Han Mei, Meaney Michael, Wang Dennis, Keppo Jussi, Tan Geoffrey Chern-Yee

机构信息

Computational Biology, National University of Singapore, Singapore, Singapore.

Singapore Institute for Clinical Sciences (SICS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore.

出版信息

Front Neuroinform. 2024 Feb 21;17:1244336. doi: 10.3389/fninf.2023.1244336. eCollection 2023.

Abstract

INTRODUCTION

Pharmacogenetics currently supports clinical decision-making on the basis of a limited number of variants in a few genes and may benefit paediatric prescribing where there is a need for more precise dosing. Integrating genomic information such as methylation into pharmacogenetic models holds the potential to improve their accuracy and consequently prescribing decisions. Cytochrome P450 2D6 () is a highly polymorphic gene conventionally associated with the metabolism of commonly used drugs and endogenous substrates. We thus sought to predict epigenetic loci from single nucleotide polymorphisms (SNPs) related to in children from the GUSTO cohort.

METHODS

Buffy coat DNA methylation was quantified using the Illumina Infinium Methylation EPIC beadchip. CpG sites associated with were used as outcome variables in Linear Regression, Elastic Net and XGBoost models. We compared feature selection of SNPs from GWAS mQTLs, GTEx eQTLs and SNPs within 2 MB of the gene and the impact of adding demographic data. The samples were split into training (75%) sets and test (25%) sets for validation. In Elastic Net model and XGBoost models, optimal hyperparameter search was done using 10-fold cross validation. Root Mean Square Error and R-squared values were obtained to investigate each models' performance. When GWAS was performed to determine SNPs associated with CpG sites, a total of 15 SNPs were identified where several SNPs appeared to influence multiple CpG sites.

RESULTS

Overall, Elastic Net models of genetic features appeared to perform marginally better than heritability estimates and substantially better than Linear Regression and XGBoost models. The addition of nongenetic features appeared to improve performance for some but not all feature sets and probes. The best feature set and Machine Learning (ML) approach differed substantially between CpG sites and a number of top variables were identified for each model.

DISCUSSION

The development of SNP-based prediction models for CYP2D6 CpG methylation in Singaporean children of varying ethnicities in this study has clinical application. With further validation, they may add to the set of tools available to improve precision medicine and pharmacogenetics-based dosing.

摘要

引言

药物遗传学目前基于少数基因中的有限数量变异来支持临床决策,在需要更精确给药剂量的儿科处方中可能会有所帮助。将诸如甲基化等基因组信息整合到药物遗传学模型中,有可能提高其准确性,从而改善处方决策。细胞色素P450 2D6(CYP2D6)是一个高度多态性的基因,传统上与常用药物和内源性底物的代谢相关。因此,我们试图从GUSTO队列中儿童与CYP2D6相关的单核苷酸多态性(SNP)预测表观遗传位点。

方法

使用Illumina Infinium Methylation EPIC芯片定量血沉棕黄层DNA甲基化。与CYP2D6相关的CpG位点用作线性回归、弹性网络和XGBoost模型的结果变量。我们比较了来自全基因组关联研究(GWAS)的甲基化数量性状位点(mQTL)、基因型组织表达(GTEx)表达数量性状位点(eQTL)的SNP以及CYP2D6基因2兆碱基内的SNP的特征选择,以及添加人口统计学数据的影响。将样本分为训练集(75%)和测试集(25%)进行验证。在弹性网络模型和XGBoost模型中,使用10折交叉验证进行最佳超参数搜索。获得均方根误差和R平方值以研究每个模型的性能。当进行GWAS以确定与CpG位点相关的SNP时,共鉴定出15个SNP,其中几个SNP似乎影响多个CpG位点。

结果

总体而言,遗传特征的弹性网络模型表现似乎略优于遗传力估计,且明显优于线性回归和XGBoost模型。添加非遗传特征似乎对某些但并非所有特征集和探针的性能有改善。最佳特征集和机器学习(ML)方法在CpG位点之间有很大差异,并且为每个模型确定了一些顶级变量。

讨论

本研究中为不同种族的新加坡儿童开发基于SNP的CYP2D6 CpG甲基化预测模型具有临床应用价值。经过进一步验证,它们可能会增加可用于改善精准医学和基于药物遗传学的给药剂量的工具集。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c71/10915285/53ae6c1dc7e1/fninf-17-1244336-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验