• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用基于集成学习的方法改进数量性状的遗传变异识别。

Improving genetic variant identification for quantitative traits using ensemble learning-based approaches.

作者信息

Sharma Jyoti, Jangale Vaishnavi, Shekhawat Rajveer Singh, Yadav Pankaj

机构信息

Department of Bioscience & Bioengineering, Indian Institute of Technology, Jodhpur, 342030, Rajasthan, India.

School of Artificial Intelligence and Data Science, Indian Institute of Technology, Jodhpur, 342030, Rajasthan, India.

出版信息

BMC Genomics. 2025 Mar 12;26(1):237. doi: 10.1186/s12864-025-11443-x.

DOI:10.1186/s12864-025-11443-x
PMID:40075256
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11899862/
Abstract

BACKGROUND

Genome-wide association studies (GWAS) are rapidly advancing due to the improved resolution and completeness provided by Telomere-to-Telomere (T2T) and pangenome assemblies. While recent advancements in GWAS methods have primarily focused on identifying genetic variants associated with discrete phenotypes, approaches for quantitative traits (QTs) remain underdeveloped. This has often led to significant variants being overlooked due to biases from genotype multicollinearity and strict p-value thresholds.

RESULTS

We propose an enhanced ensemble learning approach for QT analysis that integrates regularized variant selection with machine learning-based association methods, validated through comprehensive biological enrichment analysis. We benchmarked four widely recognized single nucleotide polymorphism (SNP) feature selection methods-least absolute shrinkage and selection operator, ridge regression, elastic-net, and mutual information-alongside four association methods: linear regression, random forest, support vector regression (SVR), and XGBoost. Our approach is evaluated on simulated datasets and validated using a subset of the PennCATH real dataset, including imputed versions, focusing on low-density lipoprotein (LDL)-cholesterol levels as a QT. The combination of elastic-net with SVR outperformed other methods across all datasets. Functional annotation of top 100 SNPs identified through this superior ensemble method revealed their expression in tissues involved in LDL cholesterol regulation. We also confirmed the involvement of six known genes (APOB, TRAPPC9, RAB2A, CCL24, FCHO2, and EEPD1) in cholesterol-related pathways and identified potential drug targets, including APOB, PTK2B, and PTPN12.

CONCLUSIONS

In conclusion, our ensemble learning approach effectively identifies variants associated with QTs, and we expect its performance to improve further with the integration of T2T and pangenome references in future GWAS.

摘要

背景

由于端粒到端粒(T2T)和泛基因组组装提供了更高的分辨率和完整性,全基因组关联研究(GWAS)正在迅速发展。虽然GWAS方法的最新进展主要集中在识别与离散表型相关的遗传变异,但用于定量性状(QT)的方法仍未得到充分发展。这常常导致由于基因型多重共线性和严格的p值阈值产生的偏差而忽略了显著变异。

结果

我们提出了一种用于QT分析的增强集成学习方法,该方法将正则化变异选择与基于机器学习的关联方法相结合,并通过全面的生物富集分析进行了验证。我们对四种广泛认可的单核苷酸多态性(SNP)特征选择方法——最小绝对收缩和选择算子、岭回归、弹性网络和互信息——以及四种关联方法:线性回归、随机森林、支持向量回归(SVR)和XGBoost进行了基准测试。我们的方法在模拟数据集上进行了评估,并使用PennCATH真实数据集的一个子集(包括推算版本)进行了验证,重点关注作为QT的低密度脂蛋白(LDL)胆固醇水平。在所有数据集中,弹性网络与SVR的组合优于其他方法。通过这种优越的集成方法鉴定出的前100个SNP的功能注释揭示了它们在参与LDL胆固醇调节的组织中的表达。我们还证实了六个已知基因(APOB、TRAPPC9、RAB2A、CCL24、FCHO2和EEPD1)参与胆固醇相关途径,并确定了潜在的药物靶点,包括APOB、PTK2B和PTPN12。

结论

总之,我们的集成学习方法有效地识别了与QT相关的变异,并且我们预计随着未来GWAS中T2T和泛基因组参考的整合,其性能将进一步提高。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4710/11899862/d16884484984/12864_2025_11443_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4710/11899862/9176b35d82e6/12864_2025_11443_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4710/11899862/dc39dec5f03a/12864_2025_11443_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4710/11899862/d16884484984/12864_2025_11443_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4710/11899862/9176b35d82e6/12864_2025_11443_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4710/11899862/dc39dec5f03a/12864_2025_11443_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4710/11899862/d16884484984/12864_2025_11443_Fig3_HTML.jpg

相似文献

1
Improving genetic variant identification for quantitative traits using ensemble learning-based approaches.使用基于集成学习的方法改进数量性状的遗传变异识别。
BMC Genomics. 2025 Mar 12;26(1):237. doi: 10.1186/s12864-025-11443-x.
2
A machine learning pipeline for quantitative phenotype prediction from genotype data.基于基因型数据的定量表型预测的机器学习管道。
BMC Bioinformatics. 2010 Oct 26;11 Suppl 8(Suppl 8):S3. doi: 10.1186/1471-2105-11-S8-S3.
3
Genome-wide association analysis of body conformation traits in Chinese Holstein Cattle.中国荷斯坦牛体型性状的全基因组关联分析
BMC Genomics. 2024 Dec 3;25(1):1174. doi: 10.1186/s12864-024-11090-8.
4
Multitrait genome association analysis identifies new susceptibility genes for human anthropometric variation in the GCAT cohort.多性状全基因组关联分析鉴定 GCAT 队列中人类人体测量变异的新易感基因。
J Med Genet. 2018 Nov;55(11):765-778. doi: 10.1136/jmedgenet-2018-105437. Epub 2018 Aug 30.
5
How powerful are summary-based methods for identifying expression-trait associations under different genetic architectures?基于汇总数据的方法在不同遗传结构下识别表达性状关联的能力有多强?
Pac Symp Biocomput. 2018;23:228-239.
6
Haplotype function score improves biological interpretation and cross-ancestry polygenic prediction of human complex traits.单体型功能评分可改善人类复杂性状的生物学解释和跨血统多基因预测。
Elife. 2024 Apr 19;12:RP92574. doi: 10.7554/eLife.92574.
7
Genetic architecture of quantitative traits in beef cattle revealed by genome wide association studies of imputed whole genome sequence variants: II: carcass merit traits.全基因组关联研究中导入的全基因组序列变异对肉牛数量性状的遗传结构分析:II:胴体肉质性状。
BMC Genomics. 2020 Jan 13;21(1):38. doi: 10.1186/s12864-019-6273-1.
8
Machine-Learning-Based Genome-Wide Association Studies for Uncovering QTL Underlying Soybean Yield and Its Components.基于机器学习的全基因组关联研究揭示大豆产量及其组分的 QTL。
Int J Mol Sci. 2022 May 16;23(10):5538. doi: 10.3390/ijms23105538.
9
Analysis of the genetic basis of fiber-related traits and flowering time in upland cotton using machine learning.利用机器学习分析陆地棉纤维相关性状和开花时间的遗传基础。
Theor Appl Genet. 2025 Jan 24;138(1):36. doi: 10.1007/s00122-025-04821-2.
10
Investigation of multi-trait associations using pathway-based analysis of GWAS summary statistics.基于 GWAS 汇总统计数据的通路分析探究多性状关联。
BMC Genomics. 2019 Feb 4;20(Suppl 1):79. doi: 10.1186/s12864-018-5373-7.

本文引用的文献

1
Genotype imputation methods for whole and complex genomic regions utilizing deep learning technology.利用深度学习技术对全基因组和复杂基因组区域进行基因型推断的方法。
J Hum Genet. 2024 Oct;69(10):481-486. doi: 10.1038/s10038-023-01213-6. Epub 2024 Jan 15.
2
Epigenetic landscape reveals MECOM as an endothelial lineage regulator.表观遗传学景观揭示 MECOM 作为内皮谱系调节因子。
Nat Commun. 2023 Apr 25;14(1):2390. doi: 10.1038/s41467-023-38002-w.
3
Requirement of Cholesterol for Calcium-Dependent Vesicle Fusion by Strengthening Synaptotagmin-1-Induced Membrane Bending.
胆固醇通过增强突触融合蛋白 1 诱导的膜弯曲对钙依赖性囊泡融合的需求。
Adv Sci (Weinh). 2023 May;10(15):e2206823. doi: 10.1002/advs.202206823. Epub 2023 Apr 14.
4
Genetic association of lipids and lipid-lowering drug target genes with non-alcoholic fatty liver disease.脂质和降脂药物靶点基因与非酒精性脂肪性肝病的遗传关联。
EBioMedicine. 2023 Apr;90:104543. doi: 10.1016/j.ebiom.2023.104543. Epub 2023 Mar 30.
5
Hepatic GATA4 regulates cholesterol and triglyceride homeostasis in collaboration with LXRs.肝脏 GATA4 与 LXRs 合作调节胆固醇和甘油三酯的体内平衡。
Genes Dev. 2022;36(21-24):1129-1144. doi: 10.1101/gad.350145.122. Epub 2022 Dec 15.
6
A Review of Feature Selection Methods for Machine Learning-Based Disease Risk Prediction.基于机器学习的疾病风险预测的特征选择方法综述
Front Bioinform. 2022 Jun 27;2:927312. doi: 10.3389/fbinf.2022.927312. eCollection 2022.
7
Human genotype-to-phenotype predictions: Boosting accuracy with nonlinear models.人类基因型到表型的预测:利用非线性模型提高准确性。
PLoS One. 2022 Aug 31;17(8):e0273293. doi: 10.1371/journal.pone.0273293. eCollection 2022.
8
The role of the glypican and syndecan families of heparan sulfate proteoglycans in cardiovascular function and disease.硫酸乙酰肝素蛋白聚糖的磷脂酰肌醇蛋白聚糖家族和多功能蛋白聚糖家族在心血管功能及疾病中的作用。
Am J Physiol Cell Physiol. 2022 Oct 1;323(4):C1052-C1060. doi: 10.1152/ajpcell.00018.2022. Epub 2022 Aug 22.
9
The Human Pangenome Project: a global resource to map genomic diversity.人类泛基因组计划:绘制基因组多样性图谱的全球资源。
Nature. 2022 Apr;604(7906):437-446. doi: 10.1038/s41586-022-04601-8. Epub 2022 Apr 20.
10
The complete sequence of a human genome.人类基因组的完整序列。
Science. 2022 Apr;376(6588):44-53. doi: 10.1126/science.abj6987. Epub 2022 Mar 31.