全基因组检测与复杂性状相关的遗传异质性区间

Genome-wide detection of intervals of genetic heterogeneity associated with complex traits.

作者信息

Llinares-López Felipe, Grimm Dominik G, Bodenham Dean A, Gieraths Udo, Sugiyama Mahito, Rowan Beth, Borgwardt Karsten

机构信息

Machine Learning and Computational Biology Lab, Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland, The Institute of Scientific and Industrial Research, Osaka University, Osaka, Japan, JST, PRESTO, Japan and Department of Molecular Biology, Max Planck Institute for Developmental Biology, Tübingen, Germany Machine Learning and Computational Biology Lab, Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland, The Institute of Scientific and Industrial Research, Osaka University, Osaka, Japan, JST, PRESTO, Japan and Department of Molecular Biology, Max Planck Institute for Developmental Biology, Tübingen, Germany.

出版信息

Bioinformatics. 2015 Jun 15;31(12):i240-9. doi: 10.1093/bioinformatics/btv263.

DOI:10.1093/bioinformatics/btv263

PMID:26072488

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4559912/

Abstract

MOTIVATION

Genetic heterogeneity, the fact that several sequence variants give rise to the same phenotype, is a phenomenon that is of the utmost interest in the analysis of complex phenotypes. Current approaches for finding regions in the genome that exhibit genetic heterogeneity suffer from at least one of two shortcomings: (i) they require the definition of an exact interval in the genome that is to be tested for genetic heterogeneity, potentially missing intervals of high relevance, or (ii) they suffer from an enormous multiple hypothesis testing problem due to the large number of potential candidate intervals being tested, which results in either many false positives or a lack of power to detect true intervals.

RESULTS

Here, we present an approach that overcomes both problems: it allows one to automatically find all contiguous sequences of single nucleotide polymorphisms in the genome that are jointly associated with the phenotype. It also solves both the inherent computational efficiency problem and the statistical problem of multiple hypothesis testing, which are both caused by the huge number of candidate intervals. We demonstrate on Arabidopsis thaliana genome-wide association study data that our approach can discover regions that exhibit genetic heterogeneity and would be missed by single-locus mapping.

CONCLUSIONS

Our novel approach can contribute to the genome-wide discovery of intervals that are involved in the genetic heterogeneity underlying complex phenotypes.

AVAILABILITY AND IMPLEMENTATION

The code can be obtained at: http://www.bsse.ethz.ch/mlcb/research/bioinformatics-and-computational-biology/sis.html.

摘要

动机

遗传异质性，即多个序列变异导致相同表型的现象，是复杂表型分析中极为重要的现象。当前用于寻找基因组中表现出遗传异质性区域的方法至少存在以下两个缺点之一：（i）它们需要定义基因组中要进行遗传异质性测试的精确区间，可能会遗漏高度相关的区间；或者（ii）由于要测试的潜在候选区间数量众多，它们面临巨大的多重假设检验问题，这会导致出现许多假阳性结果或缺乏检测真实区间的能力。

结果

在此，我们提出一种克服这两个问题的方法：它允许自动找到基因组中与表型共同相关的单核苷酸多态性的所有连续序列。它还解决了由大量候选区间导致的固有计算效率问题和多重假设检验的统计问题。我们在拟南芥全基因组关联研究数据上证明，我们的方法可以发现表现出遗传异质性且单基因座定位会遗漏的区域。

结论

我们的新方法有助于在全基因组范围内发现参与复杂表型潜在遗传异质性的区间。

可用性和实现方式

代码可从以下网址获取：http://www.bsse.ethz.ch/mlcb/research/bioinformatics-and-computational-biology/sis.html。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/006b/4559912/444a7a6a6c98/btv263f1p.jpg

相似文献

Genome-wide detection of intervals of genetic heterogeneity associated with complex traits.

Bioinformatics. 2015 Jun 15;31(12):i240-9. doi: 10.1093/bioinformatics/btv263.

Genome-wide genetic heterogeneity discovery with categorical covariates.

Bioinformatics. 2017 Jun 15;33(12):1820-1828. doi: 10.1093/bioinformatics/btx071.

A Lasso multi-marker mixed model for association mapping with population structure correction.

Bioinformatics. 2013 Jan 15;29(2):206-14. doi: 10.1093/bioinformatics/bts669. Epub 2012 Nov 22.

Exploring the Genetic Patterns of Complex Diseases via the Integrative Genome-Wide Approach.

IEEE/ACM Trans Comput Biol Bioinform. 2016 May-Jun;13(3):557-64. doi: 10.1109/TCBB.2015.2459692.

araGWAB: Network-based boosting of genome-wide association studies in Arabidopsis thaliana.

Sci Rep. 2018 Feb 13;8(1):2925. doi: 10.1038/s41598-018-21301-4.

SNPHarvester: a filtering-based approach for detecting epistatic interactions in genome-wide association studies.

Bioinformatics. 2009 Feb 15;25(4):504-11. doi: 10.1093/bioinformatics/btn652. Epub 2008 Dec 19.

An Efficient Nonlinear Regression Approach for Genome-wide Detection of Marginal and Interacting Genetic Variations.

J Comput Biol. 2016 May;23(5):372-89. doi: 10.1089/cmb.2015.0202.

Efficient network-guided multi-locus association mapping with graph cuts.

Bioinformatics. 2013 Jul 1;29(13):i171-9. doi: 10.1093/bioinformatics/btt238.

Genome-wide association studies with high-dimensional phenotypes.

Stat Appl Genet Mol Biol. 2013 Aug;12(4):413-31. doi: 10.1515/sagmb-2012-0032.

Genome-wide association study for endocrine fertility traits using single nucleotide polymorphism arrays and sequence variants in dairy cattle.

J Dairy Sci. 2016 Jul;99(7):5470-5485. doi: 10.3168/jds.2015-10533. Epub 2016 May 4.

引用本文的文献

Population-aware permutation-based significance thresholds for genome-wide association studies.

Bioinform Adv. 2024 Oct 28;4(1):vbae168. doi: 10.1093/bioadv/vbae168. eCollection 2024.

The benefits of permutation-based genome-wide association studies.

J Exp Bot. 2024 Sep 11;75(17):5377-5389. doi: 10.1093/jxb/erae280.

Genome-wide analysis to uncover how Pocillopora acuta survives the challenging intertidal environment.

Sci Rep. 2024 Apr 12;14(1):8538. doi: 10.1038/s41598-024-59268-0.

DeeP4med: deep learning for P4 medicine to predict normal and cancer transcriptome in multiple human tissues.

BMC Bioinformatics. 2023 Jul 4;24(1):275. doi: 10.1186/s12859-023-05400-2.

Higher-order genetic interaction discovery with network-based biological priors.

Bioinformatics. 2023 Jun 30;39(39 Suppl 1):i523-i533. doi: 10.1093/bioinformatics/btad273.

Efficient permutation-based genome-wide association studies for normal and skewed phenotypic distributions.

Bioinformatics. 2022 Sep 16;38(Suppl_2):ii5-ii12. doi: 10.1093/bioinformatics/btac455.

Genetic heterogeneity: Challenges, impacts, and methods through an associative lens.

Genet Epidemiol. 2022 Dec;46(8):555-571. doi: 10.1002/gepi.22497. Epub 2022 Aug 4.

CALDERA: finding all significant de Bruijn subgraphs for bacterial GWAS.

Bioinformatics. 2022 Jun 24;38(Suppl 1):i36-i44. doi: 10.1093/bioinformatics/btac238.

Network-guided search for genetic heterogeneity between gene pairs.

Bioinformatics. 2021 Apr 9;37(1):57-65. doi: 10.1093/bioinformatics/btaa581.

HiSSI: high-order SNP-SNP interactions detection based on efficient significant pattern and differential evolution.

BMC Med Genomics. 2019 Dec 30;12(Suppl 7):139. doi: 10.1186/s12920-019-0584-6.

本文引用的文献

The causes and consequences of genetic heterogeneity in cancer evolution.

Nature. 2013 Sep 19;501(7467):338-45. doi: 10.1038/nature12625.

Copy number polymorphism in plant genomes.

Theor Appl Genet. 2014 Jan;127(1):1-18. doi: 10.1007/s00122-013-2177-7. Epub 2013 Aug 29.

Statistical significance of combinatorial regulations.

Proc Natl Acad Sci U S A. 2013 Aug 6;110(32):12996-3001. doi: 10.1073/pnas.1302233110. Epub 2013 Jul 23.

GLIDE: GPU-based linear regression for detection of epistasis.

Hum Hered. 2012;73(4):220-36. doi: 10.1159/000341885. Epub 2012 Sep 4.

Balancing selection at the tomato RCR3 Guardee gene family maintains variation in strength of pathogen defense.

PLoS Genet. 2012;8(7):e1002813. doi: 10.1371/journal.pgen.1002813. Epub 2012 Jul 19.

A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3.

Fly (Austin). 2012 Apr-Jun;6(2):80-92. doi: 10.4161/fly.19695.

Structural variants in the soybean genome localize to clusters of biotic stress-response genes.

Plant Physiol. 2012 Aug;159(4):1295-308. doi: 10.1104/pp.112.194605. Epub 2012 Jun 13.

FaST linear mixed models for genome-wide association studies.

Nat Methods. 2011 Sep 4;8(10):833-5. doi: 10.1038/nmeth.1681.

A receptor-like cytoplasmic kinase phosphorylates the host target RIN4, leading to the activation of a plant innate immune receptor.

Cell Host Microbe. 2011 Feb 17;9(2):137-46. doi: 10.1016/j.chom.2011.01.010.

Analysing biological pathways in genome-wide association studies.

Nat Rev Genet. 2010 Dec;11(12):843-54. doi: 10.1038/nrg2884.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

全基因组检测与复杂性状相关的遗传异质性区间

Genome-wide detection of intervals of genetic heterogeneity associated with complex traits.

作者信息

Llinares-López Felipe, Grimm Dominik G, Bodenham Dean A, Gieraths Udo, Sugiyama Mahito, Rowan Beth, Borgwardt Karsten

机构信息