一种用于索引单核苷酸多态性（SNP）选择的双重分类树搜索算法。

A double classification tree search algorithm for index SNP selection.

作者信息

Zhang Peisen, Sheng Huitao, Uehara Ryuhei

机构信息

Laboratory of Population Genetics, National Cancer Institute, NIH, Bethesda, MD 20892, USA.

出版信息

BMC Bioinformatics. 2004 Jul 6;5:89. doi: 10.1186/1471-2105-5-89.

DOI:10.1186/1471-2105-5-89

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC476734/

Abstract

BACKGROUND

In population-based studies, it is generally recognized that single nucleotide polymorphism (SNP) markers are not independent. Rather, they are carried by haplotypes, groups of SNPs that tend to be coinherited. It is thus possible to choose a much smaller number of SNPs to use as indices for identifying haplotypes or haplotype blocks in genetic association studies. We refer to these characteristic SNPs as index SNPs. In order to reduce costs and work, a minimum number of index SNPs that can distinguish all SNP and haplotype patterns should be chosen. Unfortunately, this is an NP-complete problem, requiring brute force algorithms that are not feasible for large data sets.

RESULTS

We have developed a double classification tree search algorithm to generate index SNPs that can distinguish all SNP and haplotype patterns. This algorithm runs very rapidly and generates very good, though not necessarily minimum, sets of index SNPs, as is to be expected for such NP-complete problems.

CONCLUSIONS

A new algorithm for index SNP selection has been developed. A webserver for index SNP selection is available at http://cognia.cu-genome.org/cgi-bin/genome/snpIndex.cgi/

摘要

背景

在基于人群的研究中，人们普遍认识到单核苷酸多态性（SNP）标记并非相互独立。相反，它们由单倍型携带，单倍型是倾向于共同遗传的SNP组。因此，在基因关联研究中，可以选择数量少得多的SNP作为识别单倍型或单倍型块的指标。我们将这些特征性SNP称为索引SNP。为了降低成本和工作量，应选择能够区分所有SNP和单倍型模式的最少数量的索引SNP。不幸的是，这是一个NP完全问题，需要暴力算法，而对于大数据集来说这是不可行的。

结果

我们开发了一种双重分类树搜索算法来生成能够区分所有SNP和单倍型模式的索引SNP。该算法运行非常迅速，并生成了非常好的索引SNP集，尽管不一定是最小集，对于此类NP完全问题来说这是可以预期的。

结论

已开发出一种用于选择索引SNP的新算法。可通过http://cognia.cu-genome.org/cgi-bin/genome/snpIndex.cgi/访问用于选择索引SNP的网络服务器。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a615/476734/7c3a4dacd8f4/1471-2105-5-89-1.jpg

相似文献

1

A double classification tree search algorithm for index SNP selection.

BMC Bioinformatics. 2004 Jul 6;5:89. doi: 10.1186/1471-2105-5-89.

2

Tag SNP selection via a genetic algorithm.

J Biomed Inform. 2010 Oct;43(5):800-4. doi: 10.1016/j.jbi.2010.05.011. Epub 2010 May 28.

3

Tag SNP selection in genotype data for maximizing SNP prediction accuracy.

Bioinformatics. 2005 Jun;21 Suppl 1:i195-203. doi: 10.1093/bioinformatics/bti1021.

4

A new framework for the selection of tag SNPs by multimarker haplotypes.

J Biomed Inform. 2008 Dec;41(6):953-61. doi: 10.1016/j.jbi.2008.04.003. Epub 2008 Apr 12.

5

SNP selection for association studies: maximizing power across SNP choice and study size.

Ann Hum Genet. 2005 Nov;69(Pt 6):733-46. doi: 10.1111/j.1529-8817.2005.00202.x.

6

Genome-wide selection of tag SNPs using multiple-marker correlation.

Bioinformatics. 2007 Dec 1;23(23):3178-84. doi: 10.1093/bioinformatics/btm496. Epub 2007 Nov 15.

7

SNP-VISTA: an interactive SNP visualization tool.

BMC Bioinformatics. 2005 Dec 8;6:292. doi: 10.1186/1471-2105-6-292.

8

Accounting for genotyping errors in tagging SNP selection.

Ann Hum Genet. 2007 Jul;71(Pt 4):467-79. doi: 10.1111/j.1469-1809.2007.00354.x. Epub 2007 Mar 7.

9

Estimating haplotype frequencies and standard errors for multiple single nucleotide polymorphisms.

Biostatistics. 2003 Oct;4(4):513-22. doi: 10.1093/biostatistics/4.4.513.

10

Dynamic model based algorithms for screening and genotyping over 100 K SNPs on oligonucleotide microarrays.

Bioinformatics. 2005 May 1;21(9):1958-63. doi: 10.1093/bioinformatics/bti275. Epub 2005 Jan 18.

引用本文的文献

1

Efficient haplotype block partitioning and tag SNP selection algorithms under various constraints.

Biomed Res Int. 2013;2013:984014. doi: 10.1155/2013/984014. Epub 2013 Nov 11.

2

Exploring multilocus associations of inflammation genes and colorectal cancer risk using hapConstructor.

BMC Med Genet. 2010 Dec 3;11:170. doi: 10.1186/1471-2350-11-170.

3

HTR3B is associated with alcoholism with antisocial behavior and alpha EEG power--an intermediate phenotype for alcoholism and co-morbid behaviors.

Alcohol. 2009 Feb;43(1):73-84. doi: 10.1016/j.alcohol.2008.09.005.

4

GABRG1 and GABRA2 as independent predictors for alcoholism in two populations.

Neuropsychopharmacology. 2009 Apr;34(5):1245-54. doi: 10.1038/npp.2008.171. Epub 2008 Sep 24.

5

Addictions biology: haplotype-based analysis for 130 candidate genes on a single array.

Alcohol Alcohol. 2008 Sep-Oct;43(5):505-15. doi: 10.1093/alcalc/agn032. Epub 2008 May 12.

6

Association of ADH and ALDH genes with alcohol dependence in the Irish Affected Sib Pair Study of alcohol dependence (IASPSAD) sample.

Alcohol Clin Exp Res. 2008 May;32(5):785-95. doi: 10.1111/j.1530-0277.2008.00642.x. Epub 2008 Mar 4.

7

Do motor control genes contribute to interindividual variability in decreased movement in patients with pain?

Mol Pain. 2007 Jul 26;3:20. doi: 10.1186/1744-8069-3-20.

本文引用的文献

1

Minimal haplotype tagging.

Proc Natl Acad Sci U S A. 2003 Aug 19;100(17):9900-5. doi: 10.1073/pnas.1633613100. Epub 2003 Aug 4.

2

HapScope: a software system for automated and visual analysis of functionally annotated haplotypes.

Nucleic Acids Res. 2002 Dec 1;30(23):5213-21. doi: 10.1093/nar/gkf654.

3

Haplotype tagging for the identification of common disease genes.

Nat Genet. 2001 Oct;29(2):233-7. doi: 10.1038/ng1001-233.

4

An algorithm based on graph theory for the assembly of contigs in physical mapping of DNA.

Comput Appl Biosci. 1994 Jun;10(3):309-17. doi: 10.1093/bioinformatics/10.3.309.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

文档翻译

学术文献翻译模型，支持多种主流文档格式。