Llinares-López Felipe, Papaxanthos Laetitia, Bodenham Dean, Roqueiro Damian, Borgwardt Karsten
Machine Learning and Computational Biology Lab, Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland.
SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland.
Bioinformatics. 2017 Jun 15;33(12):1820-1828. doi: 10.1093/bioinformatics/btx071.
Genetic heterogeneity is the phenomenon that distinct genetic variants may give rise to the same phenotype. The recently introduced algorithm Fast Automatic Interval Search ( FAIS ) enables the genome-wide search of candidate regions for genetic heterogeneity in the form of any contiguous sequence of variants, and achieves high computational efficiency and statistical power. Although FAIS can test all possible genomic regions for association with a phenotype, a key limitation is its inability to correct for confounders such as gender or population structure, which may lead to numerous false-positive associations.
We propose FastCMH , a method that overcomes this problem by properly accounting for categorical confounders, while still retaining statistical power and computational efficiency. Experiments comparing FastCMH with FAIS and multiple kinds of burden tests on simulated data, as well as on human and Arabidopsis samples, demonstrate that FastCMH can drastically reduce genomic inflation and discover associations that are missed by standard burden tests.
An R package fastcmh is available on CRAN and the source code can be found at: https://www.bsse.ethz.ch/mlcb/research/bioinformatics-and-computational-biology/fastcmh.html.
Supplementary data are available at Bioinformatics online.
遗传异质性是指不同的遗传变异可能导致相同表型的现象。最近推出的快速自动区间搜索(FAIS)算法能够以任何连续变异序列的形式在全基因组范围内搜索遗传异质性的候选区域,并具有较高的计算效率和统计功效。尽管FAIS可以测试所有可能的基因组区域与表型的关联性,但其一个关键局限是无法校正诸如性别或群体结构等混杂因素,这可能导致大量假阳性关联。
我们提出了FastCMH方法,该方法通过适当考虑分类混杂因素来克服这一问题,同时仍保留统计功效和计算效率。在模拟数据以及人类和拟南芥样本上,将FastCMH与FAIS及多种负担检验进行比较的实验表明,FastCMH可以大幅降低基因组膨胀,并发现标准负担检验遗漏的关联。
一个名为fastcmh的R包可在CRAN上获取,其源代码可在以下网址找到:https://www.bsse.ethz.ch/mlcb/research/bioinformatics-and-computational-biology/fastcmh.html。
补充数据可在《生物信息学》在线获取。