一种用于连锁分析中多重检验的高效算法。

An efficient algorithm to perform multiple testing in epistasis screening.

机构信息

Systems and Modeling Unit, Montefiore Institute, University of Liège, 4000 Liège, Belgium.

出版信息

BMC Bioinformatics. 2013 Apr 24;14:138. doi: 10.1186/1471-2105-14-138.

DOI:10.1186/1471-2105-14-138

PMID:23617239

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3648350/

Abstract

BACKGROUND

Research in epistasis or gene-gene interaction detection for human complex traits has grown over the last few years. It has been marked by promising methodological developments, improved translation efforts of statistical epistasis to biological epistasis and attempts to integrate different omics information sources into the epistasis screening to enhance power. The quest for gene-gene interactions poses severe multiple-testing problems. In this context, the maxT algorithm is one technique to control the false-positive rate. However, the memory needed by this algorithm rises linearly with the amount of hypothesis tests. Gene-gene interaction studies will require a memory proportional to the squared number of SNPs. A genome-wide epistasis search would therefore require terabytes of memory. Hence, cache problems are likely to occur, increasing the computation time. In this work we present a new version of maxT, requiring an amount of memory independent from the number of genetic effects to be investigated. This algorithm was implemented in C++ in our epistasis screening software MBMDR-3.0.3. We evaluate the new implementation in terms of memory efficiency and speed using simulated data. The software is illustrated on real-life data for Crohn's disease.

RESULTS

In the case of a binary (affected/unaffected) trait, the parallel workflow of MBMDR-3.0.3 analyzes all gene-gene interactions with a dataset of 100,000 SNPs typed on 1000 individuals within 4 days and 9 hours, using 999 permutations of the trait to assess statistical significance, on a cluster composed of 10 blades, containing each four Quad-Core AMD Opteron(tm) Processor 2352 2.1 GHz. In the case of a continuous trait, a similar run takes 9 days. Our program found 14 SNP-SNP interactions with a multiple-testing corrected p-value of less than 0.05 on real-life Crohn's disease (CD) data.

CONCLUSIONS

Our software is the first implementation of the MB-MDR methodology able to solve large-scale SNP-SNP interactions problems within a few days, without using much memory, while adequately controlling the type I error rates. A new implementation to reach genome-wide epistasis screening is under construction. In the context of Crohn's disease, MBMDR-3.0.3 could identify epistasis involving regions that are well known in the field and could be explained from a biological point of view. This demonstrates the power of our software to find relevant phenotype-genotype higher-order associations.

摘要

背景

近年来，人类复杂性状的基因-基因互作检测研究取得了长足的发展。这一领域的发展特点是方法学的不断改进，统计互作向生物互作的转化取得了进展，以及试图将不同的组学信息源整合到互作筛选中以提高效能。对基因-基因互作的研究带来了严重的多重检验问题。在这种情况下，maxT 算法是控制假阳性率的一种技术。然而，该算法所需的内存与假设检验的数量呈线性关系。基因-基因互作研究所需的内存与 SNP 数量的平方成正比。因此，全基因组互作搜索需要 terabytes 的内存。因此，很可能会出现缓存问题，从而增加计算时间。在这项工作中，我们提出了一种新的 maxT 版本，该算法所需的内存与要研究的遗传效应数量无关。该算法已在我们的互作筛选软件 MBMDR-3.0.3 中用 C++实现。我们使用模拟数据评估新实现的内存效率和速度。该软件在克罗恩病的真实数据上得到了说明。

结果

在二元（患病/未患病）性状的情况下，MBMDR-3.0.3 的并行工作流程使用 1000 个个体的 100000 个 SNP 数据集，在包含四个四核 AMD Opteron(tm) Processor 2352 2.1GHz 的 10 个刀片的集群上，使用 999 次性状置换，在 4 天 9 小时内分析所有基因-基因互作，并使用 999 次性状置换来评估统计显著性。在连续性状的情况下，类似的运行需要 9 天。我们的程序在真实的克罗恩病（CD）数据中发现了 14 个 SNP-SNP 相互作用，其多重检验校正后的 p 值小于 0.05。

结论

我们的软件是第一个能够在几天内解决大规模 SNP-SNP 相互作用问题的 MB-MDR 方法的实现，同时不需要太多的内存，并且能够适当控制 I 型错误率。正在构建一个新的实现方案，以达到全基因组互作筛选。在克罗恩病的背景下，MBMDR-3.0.3 可以识别已知领域的互作区域，并从生物学角度进行解释。这表明我们的软件具有发现相关表型-基因型高阶关联的能力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8d8a/3648350/d39ba8c52c1f/1471-2105-14-138-1.jpg

相似文献

An efficient algorithm to perform multiple testing in epistasis screening.一种用于连锁分析中多重检验的高效算法。

BMC Bioinformatics. 2013 Apr 24;14:138. doi: 10.1186/1471-2105-14-138.

gammaMAXT: a fast multiple-testing correction algorithm.伽马MAXT：一种快速多重检验校正算法。

BioData Min. 2015 Nov 20;8:36. doi: 10.1186/s13040-015-0069-x. eCollection 2015.

High-throughput analysis of epistasis in genome-wide association studies with BiForce.利用 BiForce 进行全基因组关联研究中的上位性的高通量分析。

Bioinformatics. 2012 Aug 1;28(15):1957-64. doi: 10.1093/bioinformatics/bts304. Epub 2012 May 21.

Enabling personal genomics with an explicit test of epistasis.通过明确的上位性检验实现个人基因组学。

Pac Symp Biocomput. 2010:327-36. doi: 10.1142/9789814295291_0035.

mbmdr: an R package for exploring gene-gene interactions associated with binary or quantitative traits.mbmdr：一个用于探索与二项式或定量性状相关的基因-基因相互作用的 R 包。

Bioinformatics. 2010 Sep 1;26(17):2198-9. doi: 10.1093/bioinformatics/btq352. Epub 2010 Jul 1.

Performance of epistasis detection methods in semi-simulated GWAS.连锁不平衡检测方法在半模拟 GWAS 中的性能。

BMC Bioinformatics. 2018 Jun 18;19(1):231. doi: 10.1186/s12859-018-2229-8.

Protocol for Construction of Genome-Wide Epistatic SNP Networks Using WISH-R Package.使用 WISH-R 包构建全基因组上位性 SNP 网络的方案。

Methods Mol Biol. 2021;2212:155-168. doi: 10.1007/978-1-0716-0947-7_10.

IndOR: a new statistical procedure to test for SNP-SNP epistasis in genome-wide association studies.IndOR：一种用于全基因组关联研究中 SNP-SNP 互作检验的新统计方法。

Stat Med. 2012 Sep 20;31(21):2359-73. doi: 10.1002/sim.5364. Epub 2012 Jun 18.

An Exhaustive Scan Method for SNP Main Effects and SNP × SNP Interactions Over Highly Homozygous Genomes.一种针对高度纯合基因组中SNP主效应和SNP×SNP相互作用的详尽扫描方法。

J Comput Biol. 2017 Dec;24(12):1254-1264. doi: 10.1089/cmb.2017.0140. Epub 2017 Nov 3.

EPIQ-efficient detection of SNP-SNP epistatic interactions for quantitative traits.EPIQ：用于数量性状 SNP-SNP 上位性互作的高效检测。

Bioinformatics. 2014 Jun 15;30(12):i19-25. doi: 10.1093/bioinformatics/btu261.

引用本文的文献

Leveraging the genetic correlation between traits improves the detection of epistasis in genome-wide association studies.利用性状间的遗传相关性可提高全基因组关联研究中上位性的检测能力。

G3 (Bethesda). 2023 Aug 9;13(8). doi: 10.1093/g3journal/jkad118.

DeepCOMBI: explainable artificial intelligence for the analysis and discovery in genome-wide association studies.DeepCOMBI：用于全基因组关联研究分析与发现的可解释人工智能。

NAR Genom Bioinform. 2021 Jul 20;3(3):lqab065. doi: 10.1093/nargab/lqab065. eCollection 2021 Sep.

Empowering individual trait prediction using interactions for precision medicine.利用相互作用进行精准医学中的个体特质预测。

BMC Bioinformatics. 2021 Feb 18;22(1):74. doi: 10.1186/s12859-021-04011-z.

Modified entropy-based procedure detects gene-gene-interactions in unconventional genetic models.基于改进的熵的方法检测非常规遗传模型中的基因-基因相互作用。

BMC Med Genomics. 2020 Apr 23;13(1):65. doi: 10.1186/s12920-020-0703-4.

Genetic Modifiers and Rare Mendelian Disease.遗传修饰因子与罕见孟德尔遗传病。

Genes (Basel). 2020 Feb 25;11(3):239. doi: 10.3390/genes11030239.

Comparative performances of machine learning methods for classifying Crohn Disease patients using genome-wide genotyping data.使用全基因组基因分型数据对克罗恩病患者进行分类的机器学习方法的比较性能。

Sci Rep. 2019 Jul 17;9(1):10351. doi: 10.1038/s41598-019-46649-z.

How to increase our belief in discovered statistical interactions via large-scale association studies?如何通过大规模的关联研究来增加我们对已发现的统计交互作用的信心？

Hum Genet. 2019 Apr;138(4):293-305. doi: 10.1007/s00439-019-01987-w. Epub 2019 Mar 6.

Male-specific epistasis between WWC1 and TLN2 genes is associated with Alzheimer's disease.WWC1 和 TLN2 基因之间的男性特异性上位性与阿尔茨海默病有关。

Neurobiol Aging. 2018 Dec;72:188.e3-188.e12. doi: 10.1016/j.neurobiolaging.2018.08.001. Epub 2018 Aug 9.

LAMPLINK: detection of statistically significant SNP combinations from GWAS data.LAMPLINK：从全基因组关联研究（GWAS）数据中检测具有统计学意义的单核苷酸多态性（SNP）组合。

Bioinformatics. 2016 Nov 15;32(22):3513-3515. doi: 10.1093/bioinformatics/btw418. Epub 2016 Jul 13.

Functional Regression Models for Epistasis Analysis of Multiple Quantitative Traits.用于多数量性状上位性分析的功能回归模型

PLoS Genet. 2016 Apr 22;12(4):e1005965. doi: 10.1371/journal.pgen.1005965. eCollection 2016 Apr.

本文引用的文献

Five years of GWAS discovery.GWAS 发现的五年。

Am J Hum Genet. 2012 Jan 13;90(1):7-24. doi: 10.1016/j.ajhg.2011.11.029.

Lower-order effects adjustment in quantitative traits model-based multifactor dimensionality reduction.基于数量性状模型的多因子降维的低阶效应调整。

PLoS One. 2012;7(1):e29594. doi: 10.1371/journal.pone.0029594. Epub 2012 Jan 5.

The mystery of missing heritability: Genetic interactions create phantom heritability.遗传力缺失之谜：基因相互作用产生了幽灵遗传力。

Proc Natl Acad Sci U S A. 2012 Jan 24;109(4):1193-8. doi: 10.1073/pnas.1119675109. Epub 2012 Jan 5.

Inflammatory disease protective R381Q IL23 receptor polymorphism results in decreased primary CD4+ and CD8+ human T-cell functional responses.炎症性疾病保护性 R381Q IL23 受体多态性导致原发性 CD4+和 CD8+人 T 细胞功能反应降低。

Proc Natl Acad Sci U S A. 2011 Jun 7;108(23):9560-5. doi: 10.1073/pnas.1017854108. Epub 2011 May 23.

Travelling the world of gene-gene interactions.探索基因-基因相互作用的世界。

Brief Bioinform. 2012 Jan;13(1):1-19. doi: 10.1093/bib/bbr012. Epub 2011 Mar 26.

The Role of MicroRNA in Inflammatory Bowel Disease.微小RNA在炎症性肠病中的作用

Gastroenterol Hepatol (N Y). 2010 Nov;6(11):714-22.

Model-Based Multifactor Dimensionality Reduction to detect epistasis for quantitative traits in the presence of error-free and noisy data.基于模型的多因素降维分析，用于检测在无误差和噪声数据情况下的数量性状的上位性。

Eur J Hum Genet. 2011 Jun;19(6):696-703. doi: 10.1038/ejhg.2011.17. Epub 2011 Mar 16.

Model-based multifactor dimensionality reduction for detecting epistasis in case-control data in the presence of noise.基于模型的多因素降维方法，用于在存在噪声的病例对照数据中检测上位性。

Ann Hum Genet. 2011 Jan;75(1):78-89. doi: 10.1111/j.1469-1809.2010.00604.x. Epub 2010 Sep 8.

Genome-wide meta-analysis increases to 71 the number of confirmed Crohn's disease susceptibility loci.全基因组荟萃分析将确认的克罗恩病易感性位点数量增加到 71 个。

Nat Genet. 2010 Dec;42(12):1118-25. doi: 10.1038/ng.717.

Bioinformatics. 2010 Sep 1;26(17):2198-9. doi: 10.1093/bioinformatics/btq352. Epub 2010 Jul 1.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种用于连锁分析中多重检验的高效算法。

An efficient algorithm to perform multiple testing in epistasis screening.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献