FastTagger：一种利用多标记连锁不平衡进行全基因组标签 SNP 选择的高效算法。

FastTagger: an efficient algorithm for genome-wide tag SNP selection using multi-marker linkage disequilibrium.

机构信息

Department of Computer Science, National University of Singapore, Singapore.

出版信息

BMC Bioinformatics. 2010 Jan 29;11:66. doi: 10.1186/1471-2105-11-66.

DOI:10.1186/1471-2105-11-66

PMID:20113476

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3098109/

Abstract

BACKGROUND

Human genome contains millions of common single nucleotide polymorphisms (SNPs) and these SNPs play an important role in understanding the association between genetic variations and human diseases. Many SNPs show correlated genotypes, or linkage disequilibrium (LD), thus it is not necessary to genotype all SNPs for association study. Many algorithms have been developed to find a small subset of SNPs called tag SNPs that are sufficient to infer all the other SNPs. Algorithms based on the r2 LD statistic have gained popularity because r2 is directly related to statistical power to detect disease associations. Most of existing r2 based algorithms use pairwise LD. Recent studies show that multi-marker LD can help further reduce the number of tag SNPs. However, existing tag SNP selection algorithms based on multi-marker LD are both time-consuming and memory-consuming. They cannot work on chromosomes containing more than 100 k SNPs using length-3 tagging rules.

RESULTS

We propose an efficient algorithm called FastTagger to calculate multi-marker tagging rules and select tag SNPs based on multi-marker LD. FastTagger uses several techniques to reduce running time and memory consumption. Our experiment results show that FastTagger is several times faster than existing multi-marker based tag SNP selection algorithms, and it consumes much less memory at the same time. As a result, FastTagger can work on chromosomes containing more than 100 k SNPs using length-3 tagging rules.FastTagger also produces smaller sets of tag SNPs than existing multi-marker based algorithms, and the reduction ratio ranges from 3%-9% when length-3 tagging rules are used. The generated tagging rules can also be used for genotype imputation. We studied the prediction accuracy of individual rules, and the average accuracy is above 96% when r2 >/= 0.9.

CONCLUSIONS

Generating multi-marker tagging rules is a computation intensive task, and it is the bottleneck of existing multi-marker based tag SNP selection methods. FastTagger is a practical and scalable algorithm to solve this problem.

摘要

背景

人类基因组包含数百万个常见的单核苷酸多态性（SNP），这些 SNP 在理解遗传变异与人类疾病之间的关联方面发挥着重要作用。许多 SNP 显示出相关的基因型，或连锁不平衡（LD），因此没有必要对所有 SNP 进行基因型分析以进行关联研究。已经开发了许多算法来找到一小部分被称为标签 SNP 的 SNP，这些 SNP 足以推断所有其他 SNP。基于 r2 LD 统计量的算法因其与检测疾病关联的统计能力直接相关而受到欢迎。大多数现有的基于 r2 的算法使用成对 LD。最近的研究表明，多标记 LD 可以帮助进一步减少标签 SNP 的数量。然而，现有的基于多标记 LD 的标签 SNP 选择算法既耗时又耗内存。它们不能在使用长度为 3 的标记规则的包含超过 100 k SNP 的染色体上运行。

结果

我们提出了一种名为 FastTagger 的高效算法，用于计算多标记标记规则并基于多标记 LD 选择标签 SNP。FastTagger 使用了几种技术来减少运行时间和内存消耗。我们的实验结果表明，FastTagger 比现有的基于多标记的标签 SNP 选择算法快几倍，同时消耗的内存也少得多。因此，FastTagger 可以在使用长度为 3 的标记规则的包含超过 100 k SNP 的染色体上运行。FastTagger 还生成了比现有的基于多标记的算法更少的标签 SNP 集，当使用长度为 3 的标记规则时，减少比例范围为 3%-9%。生成的标记规则也可用于基因型推断。我们研究了单个规则的预测准确性，当 r2≥0.9 时，平均准确性高于 96%。

结论

生成多标记标记规则是一项计算密集型任务，是现有基于多标记的标签 SNP 选择方法的瓶颈。FastTagger 是解决此问题的实用且可扩展的算法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b60/3098109/2db0038e9402/1471-2105-11-66-1.jpg

相似文献

FastTagger: an efficient algorithm for genome-wide tag SNP selection using multi-marker linkage disequilibrium.FastTagger：一种利用多标记连锁不平衡进行全基因组标签 SNP 选择的高效算法。

BMC Bioinformatics. 2010 Jan 29;11:66. doi: 10.1186/1471-2105-11-66.

Genome-wide selection of tag SNPs using multiple-marker correlation.使用多标记相关性进行全基因组标签单核苷酸多态性选择。

Bioinformatics. 2007 Dec 1;23(23):3178-84. doi: 10.1093/bioinformatics/btm496. Epub 2007 Nov 15.

TAGster: efficient selection of LD tag SNPs in single or multiple populations.TAGster：在单个或多个群体中高效选择连锁不平衡标签单核苷酸多态性

Bioinformatics. 2007 Dec 1;23(23):3254-5. doi: 10.1093/bioinformatics/btm426. Epub 2007 Sep 7.

Multi-marker-LD based genetic algorithm for tag SNP selection.基于多标记连锁不平衡的标签单核苷酸多态性选择遗传算法

Interdiscip Sci. 2014 Dec;6(4):303-11. doi: 10.1007/s12539-012-0060-x. Epub 2014 Aug 9.

Power-based, phase-informed selection of single nucleotide polymorphisms for disease association screens.基于功效、相位信息的单核苷酸多态性选择用于疾病关联筛查。

Genet Epidemiol. 2006 Sep;30(6):459-70. doi: 10.1002/gepi.20159.

HapBlock: haplotype block partitioning and tag SNP selection software using a set of dynamic programming algorithms.HapBlock：一种使用一组动态规划算法进行单倍型块划分和标签单核苷酸多态性选择的软件。

Bioinformatics. 2005 Jan 1;21(1):131-4. doi: 10.1093/bioinformatics/bth482. Epub 2004 Aug 27.

A new model of multi-marker correlation for genome-wide tag SNP selection.一种用于全基因组标签单核苷酸多态性选择的多标记相关性新模型。

Genome Inform. 2008;21:27-41.

Haplotype block partitioning and tag SNP selection using genotype data and their applications to association studies.利用基因型数据进行单倍型块划分和标签单核苷酸多态性选择及其在关联研究中的应用。

Genome Res. 2004 May;14(5):908-16. doi: 10.1101/gr.1837404. Epub 2004 Apr 12.

The impact of missing and erroneous genotypes on tagging SNP selection and power of subsequent association tests.缺失和错误基因型对标签单核苷酸多态性选择及后续关联检验效能的影响。

Hum Hered. 2006;61(1):31-44. doi: 10.1159/000092141. Epub 2006 Mar 23.

MLR-tagging: informative SNP selection for unphased genotypes based on multiple linear regression.MLR标签法：基于多元线性回归的未分型基因型信息性单核苷酸多态性选择

Bioinformatics. 2006 Oct 15;22(20):2558-61. doi: 10.1093/bioinformatics/btl420. Epub 2006 Aug 7.

引用本文的文献

Genomic prediction of morphometric and colorimetric traits in Solanaceous fruits.茄科果实形态测量和比色特征的基因组预测

Hortic Res. 2022 Mar 23;9:uhac072. doi: 10.1093/hr/uhac072. eCollection 2022.

Using population-specific add-on polymorphisms to improve genotype imputation in underrepresented populations.利用特定人群的附加多态性提高代表性不足人群的基因型推断。

PLoS Comput Biol. 2022 Jan 13;18(1):e1009628. doi: 10.1371/journal.pcbi.1009628. eCollection 2022 Jan.

SNP variable selection by generalized graph domination.基于广义图控制的 SNP 变量选择。

PLoS One. 2019 Jan 24;14(1):e0203242. doi: 10.1371/journal.pone.0203242. eCollection 2019.

eQTL discovery and their association with severe equine asthma in European Warmblood horses.eQTL 发现及其与欧洲温血马严重哮喘的关联。

BMC Genomics. 2018 Aug 2;19(1):581. doi: 10.1186/s12864-018-4938-9.

Dual-strain genital herpes simplex virus type 2 (HSV-2) infection in the US, Peru, and 8 countries in sub-Saharan Africa: A nested cross-sectional viral genotyping study.美国、秘鲁和撒哈拉以南非洲 8 个国家的生殖器单纯疱疹病毒 2 型（HSV-2）双重感染：巢式病例对照病毒基因分型研究。

PLoS Med. 2017 Dec 27;14(12):e1002475. doi: 10.1371/journal.pmed.1002475. eCollection 2017 Dec.

Developing a 670k genotyping array to tag ~2M SNPs across 24 horse breeds.开发一个670k基因分型阵列，以标记24个马品种中的约200万个单核苷酸多态性（SNP）。

BMC Genomics. 2017 Jul 27;18(1):565. doi: 10.1186/s12864-017-3943-8.

Discovering Genome-Wide Tag SNPs Based on the Mutual Information of the Variants.基于变异体互信息发现全基因组标签单核苷酸多态性

PLoS One. 2016 Dec 16;11(12):e0167994. doi: 10.1371/journal.pone.0167994. eCollection 2016.

A powerful score-based test statistic for detecting gene-gene co-association.一种用于检测基因-基因共关联的基于分数的强大检验统计量。

BMC Genet. 2016 Jan 29;17:31. doi: 10.1186/s12863-016-0331-3.

ARG-walker: inference of individual specific strengths of meiotic recombination hotspots by population genomics analysis.ARG-Walker：通过群体基因组学分析推断减数分裂重组热点的个体特异性强度

BMC Genomics. 2015;16 Suppl 12(Suppl 12):S1. doi: 10.1186/1471-2164-16-S12-S1. Epub 2015 Dec 9.

A primer to frequent itemset mining for bioinformatics.生物信息学频繁项集挖掘入门

Brief Bioinform. 2015 Mar;16(2):216-31. doi: 10.1093/bib/bbt074. Epub 2013 Oct 26.

本文引用的文献

A new model of multi-marker correlation for genome-wide tag SNP selection.一种用于全基因组标签单核苷酸多态性选择的多标记相关性新模型。

Genome Inform. 2008;21:27-41.

A new framework for the selection of tag SNPs by multimarker haplotypes.一种基于多标记单倍型选择标签单核苷酸多态性的新框架。

J Biomed Inform. 2008 Dec;41(6):953-61. doi: 10.1016/j.jbi.2008.04.003. Epub 2008 Apr 12.

Genome-wide selection of tag SNPs using multiple-marker correlation.使用多标记相关性进行全基因组标签单核苷酸多态性选择。

Bioinformatics. 2007 Dec 1;23(23):3178-84. doi: 10.1093/bioinformatics/btm496. Epub 2007 Nov 15.

Efficient algorithms for genome-wide tagSNP selection across populations via the linkage disequilibrium criterion.通过连锁不平衡标准在不同人群中进行全基因组标签单核苷酸多态性选择的高效算法。

Comput Syst Bioinformatics Conf. 2007;6:67-78.

LdCompare: rapid computation of single- and multiple-marker r2 and genetic coverage.LdCompare：单标记和多标记r2及遗传覆盖率的快速计算

Bioinformatics. 2007 Jan 15;23(2):252-4. doi: 10.1093/bioinformatics/btl574. Epub 2006 Dec 5.

The whole genome tagSNP selection and transferability among HapMap populations.全基因组标签单核苷酸多态性（tagSNP）的选择及其在国际人类基因组单体型图（HapMap）群体间的可转移性。

Pac Symp Biocomput. 2006:535-43.

Evaluating and improving power in whole-genome association studies using fixed marker sets.使用固定标记集评估和提高全基因组关联研究的效能

Nat Genet. 2006 Jun;38(6):663-7. doi: 10.1038/ng1816. Epub 2006 May 21.

Efficient selection of tagging single-nucleotide polymorphisms in multiple populations.多个群体中标签单核苷酸多态性的高效选择

Hum Genet. 2006 Aug;120(1):58-68. doi: 10.1007/s00439-006-0182-5. Epub 2006 May 6.

An efficient comprehensive search algorithm for tagSNP selection using linkage disequilibrium criteria.一种使用连锁不平衡标准进行标签单核苷酸多态性选择的高效综合搜索算法。

Bioinformatics. 2006 Jan 15;22(2):220-5. doi: 10.1093/bioinformatics/bti762. Epub 2005 Nov 3.

Efficiency and power in genetic association studies.基因关联研究中的效率与效能

Nat Genet. 2005 Nov;37(11):1217-23. doi: 10.1038/ng1669. Epub 2005 Oct 23.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

FastTagger：一种利用多标记连锁不平衡进行全基因组标签 SNP 选择的高效算法。

FastTagger: an efficient algorithm for genome-wide tag SNP selection using multi-marker linkage disequilibrium.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献