预测纯、严格、上位模型的难度：模拟模型选择的指标。

Predicting the difficulty of pure, strict, epistatic models: metrics for simulated model selection.

机构信息

Department of Genetics, Institute for Quantitative Biomedical Sciences, Dartmouth Medical School, Lebanon, NH, USA.

出版信息

BioData Min. 2012 Sep 26;5(1):15. doi: 10.1186/1756-0381-5-15.

DOI:10.1186/1756-0381-5-15

PMID:23014095

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3549792/

Abstract

BACKGROUND

Algorithms designed to detect complex genetic disease associations are initially evaluated using simulated datasets. Typical evaluations vary constraints that influence the correct detection of underlying models (i.e. number of loci, heritability, and minor allele frequency). Such studies neglect to account for model architecture (i.e. the unique specification and arrangement of penetrance values comprising the genetic model), which alone can influence the detectability of a model. In order to design a simulation study which efficiently takes architecture into account, a reliable metric is needed for model selection.

RESULTS

We evaluate three metrics as predictors of relative model detection difficulty derived from previous works: (1) Penetrance table variance (PTV), (2) customized odds ratio (COR), and (3) our own Ease of Detection Measure (EDM), calculated from the penetrance values and respective genotype frequencies of each simulated genetic model. We evaluate the reliability of these metrics across three very different data search algorithms, each with the capacity to detect epistatic interactions. We find that a model's EDM and COR are each stronger predictors of model detection success than heritability.

CONCLUSIONS

This study formally identifies and evaluates metrics which quantify model detection difficulty. We utilize these metrics to intelligently select models from a population of potential architectures. This allows for an improved simulation study design which accounts for differences in detection difficulty attributed to model architecture. We implement the calculation and utilization of EDM and COR into GAMETES, an algorithm which rapidly and precisely generates pure, strict, n-locus epistatic models.

摘要

背景

用于检测复杂遗传疾病关联的算法最初是使用模拟数据集进行评估的。典型的评估方法会改变影响底层模型正确检测的约束条件（即基因座数量、遗传力和次要等位基因频率）。此类研究忽略了模型结构（即构成遗传模型的易感性值的独特规范和排列），而模型结构本身就可以影响模型的可检测性。为了设计一种能够有效地考虑结构的模拟研究，需要一种可靠的模型选择指标。

结果

我们评估了三种指标，作为从以前的工作中得出的相对模型检测难度的预测指标：（1）易感性表方差（PTV），（2）定制的优势比（COR），（3）我们自己的检测容易度度量（EDM），是从每个模拟遗传模型的易感性值和相应的基因型频率计算得出的。我们评估了这些指标在三种非常不同的数据搜索算法中的可靠性，每种算法都有检测上位性相互作用的能力。我们发现，一个模型的 EDM 和 COR 都是比遗传力更好的模型检测成功的预测指标。

结论

本研究正式确定并评估了量化模型检测难度的指标。我们利用这些指标从潜在结构的模型群体中智能地选择模型。这允许对模拟研究设计进行改进，以考虑到模型结构导致的检测难度差异。我们将 EDM 和 COR 的计算和利用实现到 GAMETES 中，这是一种快速而准确地生成纯、严格、n 个基因座上位性模型的算法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0a9e/3549792/95765627fe9a/1756-0381-5-15-1.jpg

相似文献

Predicting the difficulty of pure, strict, epistatic models: metrics for simulated model selection.预测纯、严格、上位模型的难度：模拟模型选择的指标。

BioData Min. 2012 Sep 26;5(1):15. doi: 10.1186/1756-0381-5-15.

A classification and characterization of two-locus, pure, strict, epistatic models for simulation and detection.两基因座、纯合、严格、上位性模型的分类与特征描述及其模拟与检测

BioData Min. 2014 Jun 9;7:8. doi: 10.1186/1756-0381-7-8. eCollection 2014.

GAMETES: a fast, direct algorithm for generating pure, strict, epistatic models with random architectures.配子：一种快速、直接的算法，用于生成具有随机结构的纯、严格、上位性模型。

BioData Min. 2012 Oct 1;5(1):16. doi: 10.1186/1756-0381-5-16.

Using Linkage Analysis to Detect Gene-Gene Interactions. 2. Improved Reliability and Extension to More-Complex Models.使用连锁分析检测基因-基因相互作用。2. 提高可靠性并扩展到更复杂的模型。

PLoS One. 2016 Jan 11;11(1):e0146240. doi: 10.1371/journal.pone.0146240. eCollection 2016.

Identifying quantitative trait locus by genetic background interactions in association studies.在关联研究中通过遗传背景相互作用鉴定数量性状基因座。

Genetics. 2007 May;176(1):553-61. doi: 10.1534/genetics.106.062992. Epub 2006 Dec 18.

Exploring the performance of Multifactor Dimensionality Reduction in large scale SNP studies and in the presence of genetic heterogeneity among epistatic disease models.探索多因素降维法在大规模单核苷酸多态性研究以及上位性疾病模型存在基因异质性情况下的性能表现。

Hum Hered. 2009;67(3):183-92. doi: 10.1159/000181157. Epub 2008 Dec 15.

A cross-validation procedure for general pedigrees and matched odds ratio fitness metric implemented for the multifactor dimensionality reduction pedigree disequilibrium test.一种用于广义家系和匹配优势比适合度度量的交叉验证程序，已实现用于多因素维度降低家系不平衡检验。

Genet Epidemiol. 2010 Feb;34(2):194-9. doi: 10.1002/gepi.20447.

A comment on two-locus epistatic interaction models for genome-wide association studies.关于全基因组关联研究的两位点上位性相互作用模型的评论

J Bioinform Comput Biol. 2015 Dec;13(6):1571004. doi: 10.1142/S0219720015710043. Epub 2015 Jul 5.

Toxo: a library for calculating penetrance tables of high-order epistasis models.Toxo：一个用于计算高阶上位性模型 penetrance 表的库。

BMC Bioinformatics. 2020 Apr 9;21(1):138. doi: 10.1186/s12859-020-3456-3.

Fast detection of high-order epistatic interactions in genome-wide association studies using information theoretic measure.利用信息论方法在全基因组关联研究中快速检测高阶上位性相互作用

Comput Biol Chem. 2014 Jun;50:19-28. doi: 10.1016/j.compbiolchem.2014.01.005. Epub 2014 Jan 27.

引用本文的文献

Assessing the limitations of relief-based algorithms in detecting higher-order interactions.评估基于 Relief 的算法在检测高阶相互作用方面的局限性。

BioData Min. 2024 Oct 1;17(1):37. doi: 10.1186/s13040-024-00390-0.

Assessing the Limitations of Relief-Based Algorithms in Detecting Higher-Order Interactions.评估基于缓解的算法在检测高阶相互作用方面的局限性。

Res Sq. 2024 Sep 2:rs.3.rs-4870116. doi: 10.21203/rs.3.rs-4870116/v1.

Evaluating the detection ability of a range of epistasis detection methods on simulated data for pure and impure epistatic models.评估一系列上位性检测方法在纯上位性模型和不纯上位性模型的模拟数据中的检测能力。

PLoS One. 2022 Feb 18;17(2):e0263390. doi: 10.1371/journal.pone.0263390. eCollection 2022.

Modified entropy-based procedure detects gene-gene-interactions in unconventional genetic models.基于改进的熵的方法检测非常规遗传模型中的基因-基因相互作用。

BMC Med Genomics. 2020 Apr 23;13(1):65. doi: 10.1186/s12920-020-0703-4.

Failure to detect synergy between variants in transferrin and hemochromatosis and Alzheimer's disease in large cohort.在大型队列中未能检测到转铁蛋白和血色病与阿尔茨海默病之间变异的协同作用。

Neurobiol Aging. 2020 May;89:142.e9-142.e12. doi: 10.1016/j.neurobiolaging.2020.01.013. Epub 2020 Feb 12.

How to increase our belief in discovered statistical interactions via large-scale association studies?如何通过大规模的关联研究来增加我们对已发现的统计交互作用的信心？

Hum Genet. 2019 Apr;138(4):293-305. doi: 10.1007/s00439-019-01987-w. Epub 2019 Mar 6.

Benchmarking relief-based feature selection methods for bioinformatics data mining.基于基准的特征选择方法在生物信息学数据挖掘中的应用。

J Biomed Inform. 2018 Sep;85:168-188. doi: 10.1016/j.jbi.2018.07.015. Epub 2018 Jul 17.

Collective feature selection to identify crucial epistatic variants.用于识别关键上位性变异的集体特征选择

BioData Min. 2018 Apr 19;11:5. doi: 10.1186/s13040-018-0168-6. eCollection 2018.

PMLB: a large benchmark suite for machine learning evaluation and comparison.PMLB：一个用于机器学习评估和比较的大型基准测试套件。

BioData Min. 2017 Dec 11;10:36. doi: 10.1186/s13040-017-0154-4. eCollection 2017.

Grid-based stochastic search for hierarchical gene-gene interactions in population-based genetic studies of common human diseases.在常见人类疾病的群体遗传学研究中，基于网格的随机搜索用于分层基因-基因相互作用

BioData Min. 2017 May 30;10:19. doi: 10.1186/s13040-017-0139-3. eCollection 2017.

本文引用的文献

An Analysis Pipeline with Statistical and Visualization-Guided Knowledge Discovery for Michigan-Style Learning Classifier Systems.用于密歇根风格学习分类器系统的具有统计和可视化引导知识发现的分析管道。

IEEE Comput Intell Mag. 2012 Nov;7(4):35-45. doi: 10.1109/MCI.2012.2215124.

Application of Genetic Algorithms to the Discovery of Complex Models for Simulation Studies in Human Genetics.遗传算法在人类遗传学模拟研究复杂模型发现中的应用。

Proc Genet Evol Comput Conf. 2002;2002:1150-1155. Epub 2002 Jul 1.

BioData Min. 2012 Oct 1;5(1):16. doi: 10.1186/1756-0381-5-16.

Routine Discovery of Complex Genetic Models using Genetic Algorithms.使用遗传算法对复杂遗传模型进行常规发现。

Appl Soft Comput. 2004 Feb 1;4(1):79-86. doi: 10.1016/j.asoc.2003.08.003.

Bioinformatics challenges for genome-wide association studies.全基因组关联研究中的生物信息学挑战。

Bioinformatics. 2010 Feb 15;26(4):445-55. doi: 10.1093/bioinformatics/btp713. Epub 2010 Jan 6.

Spatially uniform relieff (SURF) for computationally-efficient filtering of gene-gene interactions.用于计算高效过滤基因-基因相互作用的空间均匀化滤波器（SURF）。

BioData Min. 2009 Sep 22;2(1):5. doi: 10.1186/1756-0381-2-5.

Detecting gene-gene interactions that underlie human diseases.检测人类疾病相关的基因-基因相互作用。

Nat Rev Genet. 2009 Jun;10(6):392-404. doi: 10.1038/nrg2579.

Hum Hered. 2009;67(3):183-92. doi: 10.1159/000181157. Epub 2008 Dec 15.

A comparison of analytical methods for genetic association studies.基因关联研究分析方法的比较

Genet Epidemiol. 2008 Dec;32(8):767-78. doi: 10.1002/gepi.20345.

A complete classification of epistatic two-locus models.上位性双基因座模型的完整分类。

BMC Genet. 2008 Feb 19;9:17. doi: 10.1186/1471-2156-9-17.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

预测纯、严格、上位模型的难度：模拟模型选择的指标。

Predicting the difficulty of pure, strict, epistatic models: metrics for simulated model selection.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献