• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于高维基因组数据集的临床结局预测方法的比较分析。

A comparative analysis of methods for predicting clinical outcomes using high-dimensional genomic datasets.

机构信息

Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, USA.

Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA.

出版信息

J Am Med Inform Assoc. 2014 Oct;21(e2):e312-9. doi: 10.1136/amiajnl-2013-002358. Epub 2014 Apr 15.

DOI:10.1136/amiajnl-2013-002358
PMID:24737607
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4173174/
Abstract

OBJECTIVE

The objective of this investigation is to evaluate binary prediction methods for predicting disease status using high-dimensional genomic data. The central hypothesis is that the Bayesian network (BN)-based method called efficient Bayesian multivariate classifier (EBMC) will do well at this task because EBMC builds on BN-based methods that have performed well at learning epistatic interactions.

METHOD

We evaluate how well eight methods perform binary prediction using high-dimensional discrete genomic datasets containing epistatic interactions. The methods are as follows: naive Bayes (NB), model averaging NB (MANB), feature selection NB (FSNB), EBMC, logistic regression (LR), support vector machines (SVM), Lasso, and extreme learning machines (ELM). We use a hundred 1000-single nucleotide polymorphism (SNP) simulated datasets, ten 10,000-SNP datasets, six semi-synthetic sets, and two real genome-wide association studies (GWAS) datasets in our evaluation.

RESULTS

In fivefold cross-validation studies, the SVM performed best on the 1000-SNP dataset, while the BN-based methods performed best on the other datasets, with EBMC exhibiting the best overall performance. In-sample testing indicates that LR, SVM, Lasso, ELM, and NB tend to overfit the data.

DISCUSSION

EBMC performed better than NB when there are several strong predictors, whereas NB performed better when there are many weak predictors. Furthermore, for all BN-based methods, prediction capability did not degrade as the dimension increased.

CONCLUSIONS

Our results support the hypothesis that EBMC performs well at binary outcome prediction using high-dimensional discrete datasets containing epistatic-like interactions. Future research using more GWAS datasets is needed to further investigate the potential of EBMC.

摘要

目的

本研究旨在评估二元预测方法,以利用高维基因组数据预测疾病状态。中心假设是,基于贝叶斯网络(BN)的方法,即高效贝叶斯多元分类器(EBMC),将在这项任务中表现出色,因为 EBMC 基于在学习上位性相互作用方面表现良好的 BN 方法。

方法

我们评估了八种方法在使用包含上位性相互作用的高维离散基因组数据集进行二元预测的表现。这些方法如下:朴素贝叶斯(NB)、模型平均 NB(MANB)、特征选择 NB(FSNB)、EBMC、逻辑回归(LR)、支持向量机(SVM)、套索和极限学习机(ELM)。我们在评估中使用了一百个 1000 个单核苷酸多态性(SNP)模拟数据集、十个 10000-SNP 数据集、六个半合成数据集和两个全基因组关联研究(GWAS)数据集。

结果

在五重交叉验证研究中,SVM 在 1000-SNP 数据集上表现最佳,而基于 BN 的方法在其他数据集上表现最佳,EBMC 表现出最佳的整体性能。样本内测试表明,LR、SVM、套索、ELM 和 NB 倾向于过度拟合数据。

讨论

当存在多个强预测因子时,EBMC 的表现优于 NB,而当存在许多弱预测因子时,NB 的表现优于 EBMC。此外,对于所有基于 BN 的方法,预测能力不会随着维度的增加而降低。

结论

我们的结果支持 EBMC 在使用包含类似上位性相互作用的高维离散数据集进行二元结果预测方面表现良好的假设。需要使用更多的 GWAS 数据集进行进一步的研究,以进一步探讨 EBMC 的潜力。

相似文献

1
A comparative analysis of methods for predicting clinical outcomes using high-dimensional genomic datasets.基于高维基因组数据集的临床结局预测方法的比较分析。
J Am Med Inform Assoc. 2014 Oct;21(e2):e312-9. doi: 10.1136/amiajnl-2013-002358. Epub 2014 Apr 15.
2
Computational methods for ubiquitination site prediction using physicochemical properties of protein sequences.利用蛋白质序列的物理化学性质进行泛素化位点预测的计算方法。
BMC Bioinformatics. 2016 Mar 3;17:116. doi: 10.1186/s12859-016-0959-z.
3
The application of naive Bayes model averaging to predict Alzheimer's disease from genome-wide data.朴素贝叶斯模型平均在全基因组数据预测阿尔茨海默病中的应用。
J Am Med Inform Assoc. 2011 Jul-Aug;18(4):370-5. doi: 10.1136/amiajnl-2011-000101.
4
A new method for predicting patient survivorship using efficient bayesian network learning.一种使用高效贝叶斯网络学习来预测患者生存期的新方法。
Cancer Inform. 2014 Feb 13;13:47-57. doi: 10.4137/CIN.S13053. eCollection 2014.
5
Mining pure, strict epistatic interactions from high-dimensional datasets: ameliorating the curse of dimensionality.从高维数据集挖掘纯净、严格的上位性相互作用:缓解维度灾难。
PLoS One. 2012;7(10):e46771. doi: 10.1371/journal.pone.0046771. Epub 2012 Oct 12.
6
Learning genetic epistasis using Bayesian network scoring criteria.利用贝叶斯网络评分标准学习遗传上位性。
BMC Bioinformatics. 2011 Mar 31;12:89. doi: 10.1186/1471-2105-12-89.
7
Genetic studies of complex human diseases: characterizing SNP-disease associations using Bayesian networks.复杂人类疾病的遗传学研究:使用贝叶斯网络表征单核苷酸多态性与疾病的关联
BMC Syst Biol. 2012;6 Suppl 3(Suppl 3):S14. doi: 10.1186/1752-0509-6-S3-S14. Epub 2012 Dec 17.
8
Interpretable artificial neural networks incorporating Bayesian alphabet models for genome-wide prediction and association studies.基于贝叶斯字母模型的可解释人工神经网络在全基因组预测和关联研究中的应用。
G3 (Bethesda). 2021 Sep 27;11(10). doi: 10.1093/g3journal/jkab228.
9
Evaluation of a two-stage framework for prediction using big genomic data.使用大型基因组数据评估用于预测的两阶段框架。
Brief Bioinform. 2015 Nov;16(6):912-21. doi: 10.1093/bib/bbv010. Epub 2015 Mar 18.
10
Discovering causal interactions using Bayesian network scoring and information gain.使用贝叶斯网络评分和信息增益发现因果相互作用。
BMC Bioinformatics. 2016 May 26;17(1):221. doi: 10.1186/s12859-016-1084-8.

引用本文的文献

1
Leveraging Deep Learning, Grid Search, and Bayesian Networks to Predict Distant Recurrence of Breast Cancer.利用深度学习、网格搜索和贝叶斯网络预测乳腺癌远处复发
Cancers (Basel). 2025 Jul 30;17(15):2515. doi: 10.3390/cancers17152515.
2
Deep Learning: A Heuristic Three-Stage Mechanism for Grid Searches to Optimize the Future Risk Prediction of Breast Cancer Metastasis Using EHR-Based Clinical Data.深度学习:一种用于网格搜索的启发式三阶段机制,利用基于电子健康记录的临床数据优化乳腺癌转移的未来风险预测。
Cancers (Basel). 2025 Mar 25;17(7):1092. doi: 10.3390/cancers17071092.
3
Careful feature selection is key in classification of Alzheimer's disease patients based on whole-genome sequencing data.在基于全基因组测序数据对阿尔茨海默病患者进行分类时,仔细的特征选择是关键。
NAR Genom Bioinform. 2021 Jul 27;3(3):lqab069. doi: 10.1093/nargab/lqab069. eCollection 2021 Sep.
4
On Predicting lung cancer subtypes using 'omic' data from tumor and tumor-adjacent histologically-normal tissue.利用肿瘤及肿瘤旁组织学正常组织的“组学”数据预测肺癌亚型
BMC Cancer. 2016 Mar 4;16:184. doi: 10.1186/s12885-016-2223-3.
5
Computational methods for ubiquitination site prediction using physicochemical properties of protein sequences.利用蛋白质序列的物理化学性质进行泛素化位点预测的计算方法。
BMC Bioinformatics. 2016 Mar 3;17:116. doi: 10.1186/s12859-016-0959-z.
6
Comparison of machine learning classifiers for influenza detection from emergency department free-text reports.基于急诊科自由文本报告的流感检测中机器学习分类器的比较
J Biomed Inform. 2015 Dec;58:60-69. doi: 10.1016/j.jbi.2015.08.019. Epub 2015 Sep 16.
7
Novel Application of Junction Trees to the Interpretation of Epigenetic Differences among Lung Cancer Subtypes.连接树在肺癌亚型表观遗传差异解读中的新应用
AMIA Jt Summits Transl Sci Proc. 2015 Mar 23;2015:31-5. eCollection 2015.

本文引用的文献

1
Data mining of high density genomic variant data for prediction of Alzheimer's disease risk.对高密度基因组变异数据进行数据挖掘,以预测阿尔茨海默病的风险。
BMC Med Genet. 2012 Jan 25;13:7. doi: 10.1186/1471-2350-13-7.
2
A bayesian method for evaluating and discovering disease loci associations.贝叶斯方法评估和发现疾病相关基因座。
PLoS One. 2011;6(8):e22075. doi: 10.1371/journal.pone.0022075. Epub 2011 Aug 10.
3
Comparative analysis of methods for detecting interacting loci.检测互作基因座方法的比较分析。
BMC Genomics. 2011 Jul 5;12:344. doi: 10.1186/1471-2164-12-344.
4
Detecting epistatic effects in association studies at a genomic level based on an ensemble approach.基于集成方法在基因组水平上检测关联研究中的上位效应。
Bioinformatics. 2011 Jul 1;27(13):i222-9. doi: 10.1093/bioinformatics/btr227.
5
The application of naive Bayes model averaging to predict Alzheimer's disease from genome-wide data.朴素贝叶斯模型平均在全基因组数据预测阿尔茨海默病中的应用。
J Am Med Inform Assoc. 2011 Jul-Aug;18(4):370-5. doi: 10.1136/amiajnl-2011-000101.
6
Learning genetic epistasis using Bayesian network scoring criteria.利用贝叶斯网络评分标准学习遗传上位性。
BMC Bioinformatics. 2011 Mar 31;12:89. doi: 10.1186/1471-2105-12-89.
7
A fast algorithm for learning epistatic genomic relationships.一种用于学习上位性基因组关系的快速算法。
AMIA Annu Symp Proc. 2010 Nov 13;2010:341-5.
8
An efficient bayesian method for predicting clinical outcomes from genome-wide data.一种用于从全基因组数据预测临床结果的高效贝叶斯方法。
AMIA Annu Symp Proc. 2010 Nov 13;2010:127-31.
9
Genome-wide association study meta-analysis identifies seven new rheumatoid arthritis risk loci.全基因组关联研究荟萃分析确定了七个新的类风湿关节炎风险位点。
Nat Genet. 2010 Jun;42(6):508-14. doi: 10.1038/ng.582. Epub 2010 May 9.
10
A Markov blanket-based method for detecting causal SNPs in GWAS.基于马尔可夫毯的 GWAS 中因果 SNP 检测方法。
BMC Bioinformatics. 2010 Apr 29;11 Suppl 3(Suppl 3):S5. doi: 10.1186/1471-2105-11-S3-S5.