• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用引导排序和分类算法预测疾病风险。

Predicting disease risk using bootstrap ranking and classification algorithms.

机构信息

Dept of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot, Israel.

出版信息

PLoS Comput Biol. 2013;9(8):e1003200. doi: 10.1371/journal.pcbi.1003200. Epub 2013 Aug 22.

DOI:10.1371/journal.pcbi.1003200
PMID:23990773
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3749941/
Abstract

Genome-wide association studies (GWAS) are widely used to search for genetic loci that underlie human disease. Another goal is to predict disease risk for different individuals given their genetic sequence. Such predictions could either be used as a "black box" in order to promote changes in life-style and screening for early diagnosis, or as a model that can be studied to better understand the mechanism of the disease. Current methods for risk prediction typically rank single nucleotide polymorphisms (SNPs) by the p-value of their association with the disease, and use the top-associated SNPs as input to a classification algorithm. However, the predictive power of such methods is relatively poor. To improve the predictive power, we devised BootRank, which uses bootstrapping in order to obtain a robust prioritization of SNPs for use in predictive models. We show that BootRank improves the ability to predict disease risk of unseen individuals in the Wellcome Trust Case Control Consortium (WTCCC) data and results in a more robust set of SNPs and a larger number of enriched pathways being associated with the different diseases. Finally, we show that combining BootRank with seven different classification algorithms improves performance compared to previous studies that used the WTCCC data. Notably, diseases for which BootRank results in the largest improvements were recently shown to have more heritability than previously thought, likely due to contributions from variants with low minimum allele frequency (MAF), suggesting that BootRank can be beneficial in cases where SNPs affecting the disease are poorly tagged or have low MAF. Overall, our results show that improving disease risk prediction from genotypic information may be a tangible goal, with potential implications for personalized disease screening and treatment.

摘要

全基因组关联研究(GWAS)被广泛用于寻找人类疾病的遗传基因座。另一个目标是根据个体的遗传序列预测不同个体的疾病风险。这些预测可以作为“黑箱”,用于促进生活方式的改变和早期诊断筛查,也可以作为模型进行研究,以更好地了解疾病的机制。目前的风险预测方法通常根据与疾病关联的 p 值对单核苷酸多态性(SNP)进行排名,并将最相关的 SNP 用作分类算法的输入。然而,这些方法的预测能力相对较差。为了提高预测能力,我们设计了 BootRank,它使用了自举技术来获得 SNP 的稳健优先级,以便在预测模型中使用。我们表明,BootRank 提高了对未见个体疾病风险的预测能力,在惠康信托基金会病例对照研究(WTCCC)数据中,BootRank 导致与不同疾病相关的 SNP 更为稳健,富集途径数量更多。最后,我们表明,与之前使用 WTCCC 数据的研究相比,将 BootRank 与七种不同的分类算法相结合可以提高性能。值得注意的是,BootRank 结果导致改进最大的疾病最近被证明比以前认为的具有更高的遗传性,这可能是由于具有低最小等位基因频率(MAF)的变异体的贡献所致,这表明 BootRank 对于影响疾病的 SNP 标记不良或 MAF 较低的情况可能是有益的。总体而言,我们的研究结果表明,从基因型信息中提高疾病风险预测可能是一个切实可行的目标,这可能对个性化疾病筛查和治疗产生影响。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7814/3749941/00ba66345356/pcbi.1003200.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7814/3749941/9b7417f77668/pcbi.1003200.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7814/3749941/d1f175645477/pcbi.1003200.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7814/3749941/00ba66345356/pcbi.1003200.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7814/3749941/9b7417f77668/pcbi.1003200.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7814/3749941/d1f175645477/pcbi.1003200.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7814/3749941/00ba66345356/pcbi.1003200.g003.jpg

相似文献

1
Predicting disease risk using bootstrap ranking and classification algorithms.使用引导排序和分类算法预测疾病风险。
PLoS Comput Biol. 2013;9(8):e1003200. doi: 10.1371/journal.pcbi.1003200. Epub 2013 Aug 22.
2
Disease liability prediction from large scale genotyping data using classifiers with a reject option.利用带有拒绝选项的分类器从大规模基因分型数据中预测疾病易感性。
IEEE/ACM Trans Comput Biol Bioinform. 2012 Jan-Feb;9(1):88-97. doi: 10.1109/TCBB.2011.44. Epub 2011 Mar 3.
3
Improved prediction of cardiovascular disease based on a panel of single nucleotide polymorphisms identified through genome-wide association studies.基于通过全基因组关联研究鉴定出的单核苷酸多态性面板,改善心血管疾病预测。
Circ Cardiovasc Genet. 2010 Oct;3(5):468-74. doi: 10.1161/CIRCGENETICS.110.946269. Epub 2010 Aug 21.
4
Genome-wide association data classification and SNPs selection using two-stage quality-based Random Forests.使用基于质量的两阶段随机森林进行全基因组关联数据分类和单核苷酸多态性选择。
BMC Genomics. 2015;16 Suppl 2(Suppl 2):S5. doi: 10.1186/1471-2164-16-S2-S5. Epub 2015 Jan 21.
5
Effective genetic-risk prediction using mixed models.使用混合模型进行有效的遗传风险预测。
Am J Hum Genet. 2014 Oct 2;95(4):383-93. doi: 10.1016/j.ajhg.2014.09.007.
6
MegaSNPHunter: a learning approach to detect disease predisposition SNPs and high level interactions in genome wide association study.MegaSNPHunter:一种在全基因组关联研究中检测疾病易感性单核苷酸多态性和高阶相互作用的学习方法。
BMC Bioinformatics. 2009 Jan 9;10:13. doi: 10.1186/1471-2105-10-13.
7
GWIS--model-free, fast and exhaustive search for epistatic interactions in case-control GWAS.GWIS--无模型、快速且全面搜索病例对照 GWAS 中的上位相互作用。
BMC Genomics. 2013;14 Suppl 3(Suppl 3):S10. doi: 10.1186/1471-2164-14-S3-S10. Epub 2013 May 28.
8
Assessing statistical significance in multivariable genome wide association analysis.评估多变量全基因组关联分析中的统计学显著性。
Bioinformatics. 2016 Jul 1;32(13):1990-2000. doi: 10.1093/bioinformatics/btw128. Epub 2016 Mar 7.
9
Influence of Feature Encoding and Choice of Classifier on Disease Risk Prediction in Genome-Wide Association Studies.全基因组关联研究中特征编码和分类器选择对疾病风险预测的影响
PLoS One. 2015 Aug 18;10(8):e0135832. doi: 10.1371/journal.pone.0135832. eCollection 2015.
10
Performance of risk prediction for inflammatory bowel disease based on genotyping platform and genomic risk score method.基于基因分型平台和基因组风险评分方法的炎症性肠病风险预测性能
BMC Med Genet. 2017 Aug 29;18(1):94. doi: 10.1186/s12881-017-0451-2.

引用本文的文献

1
Identification and validation of oxidative stress-related diagnostic markers for recurrent pregnancy loss: insights from machine learning and molecular analysis.复发性流产氧化应激相关诊断标志物的识别与验证:机器学习和分子分析的见解
Mol Divers. 2024 Sep 3. doi: 10.1007/s11030-024-10947-0.
2
Deep Learning Framework for Complex Disease Risk Prediction Using Genomic Variations.深度学习框架用于使用基因组变异预测复杂疾病风险。
Sensors (Basel). 2023 May 1;23(9):4439. doi: 10.3390/s23094439.
3
Lack of association of genetic variants for diabetic retinopathy in Taiwanese patients with diabetic nephropathy.

本文引用的文献

1
Improved heritability estimation from genome-wide SNPs.提高全基因组 SNP 遗传力估计值。
Am J Hum Genet. 2012 Dec 7;91(6):1011-21. doi: 10.1016/j.ajhg.2012.10.010.
2
High-density genetic mapping identifies new susceptibility loci for rheumatoid arthritis.高密度遗传图谱分析确定类风湿关节炎的新易感位点。
Nat Genet. 2012 Dec;44(12):1336-40. doi: 10.1038/ng.2462. Epub 2012 Nov 11.
3
Genome-wide association study in Chinese men identifies two new prostate cancer risk loci at 9q31.2 and 19q13.4.全基因组关联研究在中国男性中鉴定出两个新的前列腺癌风险位点,位于 9q31.2 和 19q13.4。
在台湾的糖尿病肾病患者中,遗传变异与糖尿病视网膜病变之间缺乏关联。
BMJ Open Diabetes Res Care. 2020 Jan;8(1). doi: 10.1136/bmjdrc-2019-000727.
4
Computational methods using genome-wide association studies to predict radiotherapy complications and to identify correlative molecular processes.使用全基因组关联研究的计算方法来预测放疗并发症,并识别相关的分子过程。
Sci Rep. 2017 Feb 24;7:43381. doi: 10.1038/srep43381.
5
Combining Multiple Hypothesis Testing with Machine Learning Increases the Statistical Power of Genome-wide Association Studies.将多重假设检验与机器学习相结合可提高全基因组关联研究的统计效力。
Sci Rep. 2016 Nov 28;6:36671. doi: 10.1038/srep36671.
6
The Prediction of Radiotherapy Toxicity Using Single Nucleotide Polymorphism-Based Models: A Step Toward Prevention.使用基于单核苷酸多态性的模型预测放疗毒性:迈向预防的一步。
Semin Radiat Oncol. 2015 Oct;25(4):281-91. doi: 10.1016/j.semradonc.2015.05.006. Epub 2015 May 15.
7
In silico phenotyping via co-training for improved phenotype prediction from genotype.通过协同训练进行计算机表型分析以改善从基因型预测表型的效果。
Bioinformatics. 2015 Jun 15;31(12):i303-10. doi: 10.1093/bioinformatics/btv254.
8
Integrative random forest for gene regulatory network inference.用于基因调控网络推断的集成随机森林
Bioinformatics. 2015 Jun 15;31(12):i197-205. doi: 10.1093/bioinformatics/btv268.
9
Network tuned multiple rank aggregation and applications to gene ranking.网络调谐的多重排序聚合及其在基因排序中的应用。
BMC Bioinformatics. 2015;16 Suppl 1(Suppl 1):S6. doi: 10.1186/1471-2105-16-S1-S6. Epub 2015 Jan 21.
10
Regularized machine learning in the genetic prediction of complex traits.复杂性状遗传预测中的正则化机器学习
PLoS Genet. 2014 Nov 13;10(11):e1004754. doi: 10.1371/journal.pgen.1004754. eCollection 2014 Nov.
Nat Genet. 2012 Nov;44(11):1231-5. doi: 10.1038/ng.2424. Epub 2012 Sep 30.
4
A genome-wide association study identifies GRK5 and RASGRP1 as type 2 diabetes loci in Chinese Hans.一项全基因组关联研究鉴定出 GRK5 和 RASGRP1 是汉族人群 2 型糖尿病的新易感位点。
Diabetes. 2013 Jan;62(1):291-8. doi: 10.2337/db12-0454. Epub 2012 Sep 6.
5
Epistasis network centrality analysis yields pathway replication across two GWAS cohorts for bipolar disorder.连锁不平衡网络中心性分析在两项双相情感障碍 GWAS 队列中产生了通路复制。
Transl Psychiatry. 2012 Aug 14;2(8):e154. doi: 10.1038/tp.2012.80.
6
A genome-wide association study identifies two new susceptibility loci for lung adenocarcinoma in the Japanese population.全基因组关联研究在日本人群中鉴定出肺腺癌的两个新的易感位点。
Nat Genet. 2012 Jul 15;44(8):900-3. doi: 10.1038/ng.2353.
7
Risk estimation and risk prediction using machine-learning methods.利用机器学习方法进行风险评估和预测。
Hum Genet. 2012 Oct;131(10):1639-54. doi: 10.1007/s00439-012-1194-y. Epub 2012 Jul 3.
8
Blockade of the hedgehog pathway inhibits osteophyte formation in arthritis.阻断 hedgehog 通路可抑制关节炎中的骨赘形成。
Ann Rheum Dis. 2012 Mar;71(3):400-7. doi: 10.1136/ard.2010.148262. Epub 2012 Jan 10.
9
Data mining approaches for genome-wide association of mood disorders.用于情绪障碍全基因组关联研究的数据挖掘方法。
Psychiatr Genet. 2012 Apr;22(2):55-61. doi: 10.1097/YPG.0b013e32834dc40d.
10
KEGG for integration and interpretation of large-scale molecular data sets.KEGG 用于整合和解释大规模分子数据集。
Nucleic Acids Res. 2012 Jan;40(Database issue):D109-14. doi: 10.1093/nar/gkr988. Epub 2011 Nov 10.