• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

多因素维度缩减的内部验证技术比较。

A comparison of internal validation techniques for multifactor dimensionality reduction.

机构信息

Department of Statistics, North Carolina State University, Raleigh, NC 27695, USA.

出版信息

BMC Bioinformatics. 2010 Jul 22;11:394. doi: 10.1186/1471-2105-11-394.

DOI:10.1186/1471-2105-11-394
PMID:20650002
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2920275/
Abstract

BACKGROUND

It is hypothesized that common, complex diseases may be due to complex interactions between genetic and environmental factors, which are difficult to detect in high-dimensional data using traditional statistical approaches. Multifactor Dimensionality Reduction (MDR) is the most commonly used data-mining method to detect epistatic interactions. In all data-mining methods, it is important to consider internal validation procedures to obtain prediction estimates to prevent model over-fitting and reduce potential false positive findings. Currently, MDR utilizes cross-validation for internal validation. In this study, we incorporate the use of a three-way split (3WS) of the data in combination with a post-hoc pruning procedure as an alternative to cross-validation for internal model validation to reduce computation time without impairing performance. We compare the power to detect true disease causing loci using MDR with both 5- and 10-fold cross-validation to MDR with 3WS for a range of single-locus and epistatic disease models. Additionally, we analyze a dataset in HIV immunogenetics to demonstrate the results of the two strategies on real data.

RESULTS

MDR with 3WS is computationally approximately five times faster than 5-fold cross-validation. The power to find the exact true disease loci without detecting false positive loci is higher with 5-fold cross-validation than with 3WS before pruning. However, the power to find the true disease causing loci in addition to false positive loci is equivalent to the 3WS. With the incorporation of a pruning procedure after the 3WS, the power of the 3WS approach to detect only the exact disease loci is equivalent to that of MDR with cross-validation. In the real data application, the cross-validation and 3WS analyses indicate the same two-locus model.

CONCLUSIONS

Our results reveal that the performance of the two internal validation methods is equivalent with the use of pruning procedures. The specific pruning procedure should be chosen understanding the trade-off between identifying all relevant genetic effects but including false positives and missing important genetic factors. This implies 3WS may be a powerful and computationally efficient approach to screen for epistatic effects, and could be used to identify candidate interactions in large-scale genetic studies.

摘要

背景

据推测,常见的复杂疾病可能是由于遗传和环境因素之间的复杂相互作用所致,而传统的统计方法很难在高维数据中检测到这些相互作用。多因子维度缩减(MDR)是最常用的数据挖掘方法,用于检测上位性相互作用。在所有数据挖掘方法中,考虑内部验证程序以获得预测估计值以防止模型过度拟合并减少潜在的假阳性发现非常重要。目前,MDR 利用交叉验证进行内部验证。在这项研究中,我们结合使用数据的三向拆分(3WS)和事后修剪过程作为替代交叉验证的内部模型验证方法,以减少计算时间而不会影响性能。我们比较了使用 MDR 与 5 倍和 10 倍交叉验证检测真实疾病致病基因座的能力,以及 MDR 与 3WS 结合使用的能力,用于一系列单基因座和上位性疾病模型。此外,我们分析了 HIV 免疫遗传学中的数据集,以证明这两种策略在真实数据上的结果。

结果

与 5 倍交叉验证相比,使用 3WS 的 MDR 的计算速度大约快 5 倍。在修剪之前,5 倍交叉验证找到没有检测到假阳性基因座的精确真实疾病基因座的能力高于 3WS。但是,找到真实疾病基因座加上假阳性基因座的能力与 3WS 相当。在 3WS 之后采用修剪程序,3WS 方法仅检测精确疾病基因座的能力与交叉验证的 MDR 相当。在实际数据应用中,交叉验证和 3WS 分析表明了相同的双基因座模型。

结论

我们的结果表明,两种内部验证方法的性能在使用修剪程序时是等效的。应根据识别所有相关遗传效应但包括假阳性和缺失重要遗传因素之间的权衡选择特定的修剪程序。这意味着 3WS 可能是一种强大且计算效率高的筛选上位性效应的方法,并可用于在大规模遗传研究中识别候选相互作用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7be5/2920275/f4c76be5ca77/1471-2105-11-394-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7be5/2920275/1cfdbba62a11/1471-2105-11-394-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7be5/2920275/dfbe80d1c366/1471-2105-11-394-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7be5/2920275/cc7856794f9c/1471-2105-11-394-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7be5/2920275/a9f0bbebabbe/1471-2105-11-394-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7be5/2920275/2c31b1f7e298/1471-2105-11-394-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7be5/2920275/823c53763f37/1471-2105-11-394-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7be5/2920275/f4c76be5ca77/1471-2105-11-394-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7be5/2920275/1cfdbba62a11/1471-2105-11-394-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7be5/2920275/dfbe80d1c366/1471-2105-11-394-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7be5/2920275/cc7856794f9c/1471-2105-11-394-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7be5/2920275/a9f0bbebabbe/1471-2105-11-394-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7be5/2920275/2c31b1f7e298/1471-2105-11-394-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7be5/2920275/823c53763f37/1471-2105-11-394-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7be5/2920275/f4c76be5ca77/1471-2105-11-394-7.jpg

相似文献

1
A comparison of internal validation techniques for multifactor dimensionality reduction.多因素维度缩减的内部验证技术比较。
BMC Bioinformatics. 2010 Jul 22;11:394. doi: 10.1186/1471-2105-11-394.
2
A comparison of internal model validation methods for multifactor dimensionality reduction in the case of genetic heterogeneity.遗传异质性情况下多因素降维内部模型验证方法的比较
BMC Res Notes. 2012 Nov 5;5:623. doi: 10.1186/1756-0500-5-623.
3
A comparison of multifactor dimensionality reduction and L1-penalized regression to identify gene-gene interactions in genetic association studies.在基因关联研究中比较多因素降维和L1惩罚回归以识别基因-基因相互作用
Stat Appl Genet Mol Biol. 2011;10(1):Article 4. doi: 10.2202/1544-6115.1613. Epub 2011 Jan 6.
4
A computationally efficient hypothesis testing method for epistasis analysis using multifactor dimensionality reduction.一种使用多因素降维进行上位性分析的计算高效的假设检验方法。
Genet Epidemiol. 2009 Jan;33(1):87-94. doi: 10.1002/gepi.20360.
5
A novel survival multifactor dimensionality reduction method for detecting gene-gene interactions with application to bladder cancer prognosis.一种新的生存多因素降维方法,用于检测膀胱癌预后的基因-基因相互作用。
Hum Genet. 2011 Jan;129(1):101-10. doi: 10.1007/s00439-010-0905-5. Epub 2010 Oct 28.
6
Spatial rank-based multifactor dimensionality reduction to detect gene-gene interactions for multivariate phenotypes.基于空间秩的多因素降维分析检测多变量表型的基因-基因交互作用。
BMC Bioinformatics. 2021 Oct 4;22(1):480. doi: 10.1186/s12859-021-04395-y.
7
Exploring the performance of Multifactor Dimensionality Reduction in large scale SNP studies and in the presence of genetic heterogeneity among epistatic disease models.探索多因素降维法在大规模单核苷酸多态性研究以及上位性疾病模型存在基因异质性情况下的性能表现。
Hum Hered. 2009;67(3):183-92. doi: 10.1159/000181157. Epub 2008 Dec 15.
8
A cross-validation procedure for general pedigrees and matched odds ratio fitness metric implemented for the multifactor dimensionality reduction pedigree disequilibrium test.一种用于广义家系和匹配优势比适合度度量的交叉验证程序,已实现用于多因素维度降低家系不平衡检验。
Genet Epidemiol. 2010 Feb;34(2):194-9. doi: 10.1002/gepi.20447.
9
A unified model based multifactor dimensionality reduction framework for detecting gene-gene interactions.一种基于统一模型的多因素降维框架用于检测基因-基因相互作用。
Bioinformatics. 2016 Sep 1;32(17):i605-i610. doi: 10.1093/bioinformatics/btw424.
10
Model-based multifactor dimensionality reduction for detecting epistasis in case-control data in the presence of noise.基于模型的多因素降维方法,用于在存在噪声的病例对照数据中检测上位性。
Ann Hum Genet. 2011 Jan;75(1):78-89. doi: 10.1111/j.1469-1809.2010.00604.x. Epub 2010 Sep 8.

引用本文的文献

1
Gene Polymorphisms and Gene-Gene Interactions Are Associated with Restenosis after Coronary Stenting.基因多态性与基因-基因相互作用与冠状动脉支架置入术后再狭窄相关。
Biomolecules. 2022 May 31;12(6):765. doi: 10.3390/biom12060765.
2
A Machine Learning Algorithm for Quantitatively Diagnosing Oxidative Stress Risks in Healthy Adult Individuals Based on Health Space Methodology: A Proof-of-Concept Study Using Korean Cross-Sectional Cohort Data.基于健康空间方法的健康成年个体氧化应激风险定量诊断机器学习算法:一项使用韩国横断面队列数据的概念验证研究
Antioxidants (Basel). 2021 Jul 16;10(7):1132. doi: 10.3390/antiox10071132.
3
KNN-MDR: a learning approach for improving interactions mapping performances in genome wide association studies.

本文引用的文献

1
Routine Discovery of Complex Genetic Models using Genetic Algorithms.使用遗传算法对复杂遗传模型进行常规发现。
Appl Soft Comput. 2004 Feb 1;4(1):79-86. doi: 10.1016/j.asoc.2003.08.003.
2
Alternative Cross-Over Strategies and Selection Techniques for Grammatical Evolution Optimized Neural Networks.用于语法进化优化神经网络的替代交叉策略和选择技术。
Genet Evol Comput Conf. 2006;2006:947-948. doi: 10.1145/1143997.1144163.
3
New evaluation measures for multifactor dimensionality reduction classifiers in gene-gene interaction analysis.
KNN-MDR:一种用于提高全基因组关联研究中相互作用图谱性能的学习方法。
BMC Bioinformatics. 2017 Mar 21;18(1):184. doi: 10.1186/s12859-017-1599-7.
4
A roadmap to multifactor dimensionality reduction methods.多因素降维方法路线图。
Brief Bioinform. 2016 Mar;17(2):293-308. doi: 10.1093/bib/bbv038. Epub 2015 Jun 24.
5
Evaluation of genetic risk score models in the presence of interaction and linkage disequilibrium.评估存在交互作用和连锁不平衡时的遗传风险评分模型。
Front Genet. 2013 Jul 23;4:138. doi: 10.3389/fgene.2013.00138. eCollection 2013.
6
Hip fracture risk assessment: artificial neural network outperforms conditional logistic regression in an age- and sex-matched case control study.髋关节骨折风险评估:在年龄和性别匹配的病例对照研究中,人工神经网络优于条件逻辑回归。
BMC Musculoskelet Disord. 2013 Jul 15;14:207. doi: 10.1186/1471-2474-14-207.
7
A comparison of internal model validation methods for multifactor dimensionality reduction in the case of genetic heterogeneity.遗传异质性情况下多因素降维内部模型验证方法的比较
BMC Res Notes. 2012 Nov 5;5:623. doi: 10.1186/1756-0500-5-623.
8
A new explained-variance based genetic risk score for predictive modeling of disease risk.一种基于解释方差的新型遗传风险评分,用于疾病风险的预测建模。
Stat Appl Genet Mol Biol. 2012 Sep 25;11(4):Article 15. doi: 10.1515/1544-6115.1796.
9
Performance analysis of novel methods for detecting epistasis.检测上位性的新方法的性能分析。
BMC Bioinformatics. 2011 Dec 15;12:475. doi: 10.1186/1471-2105-12-475.
10
An R package implementation of multifactor dimensionality reduction.多因子维度降低的 R 包实现。
BioData Min. 2011 Aug 16;4(1):24. doi: 10.1186/1756-0381-4-24.
基因-基因相互作用分析中多因素降维分类器的新评估方法。
Bioinformatics. 2009 Feb 1;25(3):338-45. doi: 10.1093/bioinformatics/btn629. Epub 2009 Jan 22.
4
Interaction between interleukin 3 and dystrobrevin-binding protein 1 in schizophrenia.精神分裂症中白细胞介素3与肌萎缩蛋白结合蛋白1之间的相互作用。
Schizophr Res. 2008 Dec;106(2-3):208-17. doi: 10.1016/j.schres.2008.07.022. Epub 2008 Sep 18.
5
A comparison of analytical methods for genetic association studies.基因关联研究分析方法的比较
Genet Epidemiol. 2008 Dec;32(8):767-78. doi: 10.1002/gepi.20345.
6
A generalized combinatorial approach for detecting gene-by-gene and gene-by-environment interactions with application to nicotine dependence.一种用于检测基因与基因以及基因与环境相互作用的广义组合方法及其在尼古丁依赖中的应用。
Am J Hum Genet. 2007 Jun;80(6):1125-37. doi: 10.1086/518312. Epub 2007 Apr 25.
7
A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction.一种使用多因素降维方法在不平衡数据集中进行上位性建模的平衡准确率函数。
Genet Epidemiol. 2007 May;31(4):306-15. doi: 10.1002/gepi.20211.
8
ABCB1 and GST polymorphisms associated with TP53 status in breast cancer.ABCB1和谷胱甘肽S-转移酶多态性与乳腺癌中TP53状态的关联。
Pharmacogenet Genomics. 2007 Feb;17(2):127-36. doi: 10.1097/FPC.0b013e328011abaa.
9
Data simulation software for whole-genome association and other studies in human genetics.用于全基因组关联研究及人类遗传学其他研究的数据模拟软件。
Pac Symp Biocomput. 2006:499-510.
10
Odds ratio based multifactor-dimensionality reduction method for detecting gene-gene interactions.基于比值比的多因素降维方法用于检测基因-基因相互作用。
Bioinformatics. 2007 Jan 1;23(1):71-6. doi: 10.1093/bioinformatics/btl557. Epub 2006 Nov 8.