通过在滑动窗口上进行快速最近邻搜索来推断大型单核苷酸多态性（SNP）面板中缺失的基因型。

Inferring missing genotypes in large SNP panels using fast nearest-neighbor searches over sliding windows.

作者信息

Roberts Adam, McMillan Leonard, Wang Wei, Parker Joel, Rusyn Ivan, Threadgill David

机构信息

Department of Computer Science, University of North Carolina, Chapel Hill, NC 27599, USA.

出版信息

Bioinformatics. 2007 Jul 1;23(13):i401-7. doi: 10.1093/bioinformatics/btm220.

DOI:10.1093/bioinformatics/btm220

PMID:17646323

Abstract

MOTIVATION

Typical high-throughput genotyping techniques produce numerous missing calls that confound subsequent analyses, such as disease association studies. Common remedies for this problem include removing affected markers and/or samples or, otherwise, imputing the missing data. On small marker sets imputation is frequently based on a vote of the K-nearest-neighbor (KNN) haplotypes, but this technique is neither practical nor justifiable for large datasets.

RESULTS

We describe a data structure that supports efficient KNN queries over arbitrarily sized, sliding haplotype windows, and evaluate its use for genotype imputation. The performance of our method enables exhaustive exploration over all window sizes and known sites in large (150K, 8.3M) SNP panels. We also compare the accuracy and performance of our methods with competing imputation approaches.

AVAILABILITY

A free open source software package, NPUTE, is available at http://compgen.unc.edu/software, for non-commercial uses.

摘要

动机

典型的高通量基因分型技术会产生大量缺失值，这会干扰后续分析，如疾病关联研究。针对此问题的常见补救措施包括去除受影响的标记和/或样本，或者对缺失数据进行插补。在小标记集上，插补通常基于K近邻（KNN）单倍型的投票，但该技术对于大型数据集既不实用也不合理。

结果

我们描述了一种数据结构，它支持在任意大小的滑动单倍型窗口上进行高效的KNN查询，并评估其在基因型插补中的应用。我们方法的性能使得能够对大型（150K、830万）SNP面板中的所有窗口大小和已知位点进行详尽探索。我们还将我们方法的准确性和性能与竞争的插补方法进行了比较。

可用性

一个免费的开源软件包NPUTE可从http://compgen.unc.edu/software获取，供非商业使用。

相似文献

Inferring missing genotypes in large SNP panels using fast nearest-neighbor searches over sliding windows.

Bioinformatics. 2007 Jul 1;23(13):i401-7. doi: 10.1093/bioinformatics/btm220.

SNP-PHAGE--High throughput SNP discovery pipeline.

BMC Bioinformatics. 2006 Oct 23;7:468. doi: 10.1186/1471-2105-7-468.

Dynamic model based algorithms for screening and genotyping over 100 K SNPs on oligonucleotide microarrays.

Bioinformatics. 2005 May 1;21(9):1958-63. doi: 10.1093/bioinformatics/bti275. Epub 2005 Jan 18.

An efficient comprehensive search algorithm for tagSNP selection using linkage disequilibrium criteria.

Bioinformatics. 2006 Jan 15;22(2):220-5. doi: 10.1093/bioinformatics/bti762. Epub 2005 Nov 3.

Inference of missing SNPs and information quantity measurements for haplotype blocks.

Bioinformatics. 2005 May 1;21(9):2001-7. doi: 10.1093/bioinformatics/bti261. Epub 2005 Feb 4.

2SNP: scalable phasing based on 2-SNP haplotypes.

Bioinformatics. 2006 Feb 1;22(3):371-3. doi: 10.1093/bioinformatics/bti785. Epub 2005 Nov 15.

Simulating association studies: a data-based resampling method for candidate regions or whole genome scans.

Bioinformatics. 2007 Oct 1;23(19):2581-8. doi: 10.1093/bioinformatics/btm386. Epub 2007 Sep 4.

BNTagger: improved tagging SNP selection using Bayesian networks.

Bioinformatics. 2006 Jul 15;22(14):e211-9. doi: 10.1093/bioinformatics/btl233.

A highly informative SNP linkage panel for human genetic studies.

Nat Methods. 2004 Nov;1(2):113-7. doi: 10.1038/nmeth712. Epub 2004 Oct 21.

Genotyping over 100,000 SNPs on a pair of oligonucleotide arrays.

Nat Methods. 2004 Nov;1(2):109-11. doi: 10.1038/nmeth718.

引用本文的文献

Prediction of biomass sorghum hybrids using environmental feature-enriched genomic combining ability models in tropical environments.

Theor Appl Genet. 2025 May 9;138(6):113. doi: 10.1007/s00122-025-04895-y.

Genomic selection of maize test-cross hybrids leveraged by marker sampling.

Plant Genome. 2025 Jun;18(2):e70030. doi: 10.1002/tpg2.70030.

The Identification of a Single-Base Mutation in the Maize Gene Responsible for Reduced Plant Height in the Mutant 16N125.

Plants (Basel). 2025 Apr 15;14(8):1217. doi: 10.3390/plants14081217.

Identification and functional validation of a new gene conferring resistance to strains SC4 and SC20 in soybean.

Front Plant Sci. 2025 Jan 27;15:1518829. doi: 10.3389/fpls.2024.1518829. eCollection 2024.

Genome-wide association study of sucrose content in vegetable soybean.

BMC Plant Biol. 2024 Dec 27;24(1):1264. doi: 10.1186/s12870-024-06006-3.

Multi-trait association mapping for phosphorous efficiency reveals flexible root architectures in sorghum.

BMC Plant Biol. 2024 Jun 15;24(1):562. doi: 10.1186/s12870-024-05183-5.

Joint analysis of phenotype-effect-generation identifies loci associated with grain quality traits in rice hybrids.

Nat Commun. 2023 Jul 4;14(1):3930. doi: 10.1038/s41467-023-39534-x.

Lipidomic profiling of the hepatic esterified fatty acid composition in diet-induced nonalcoholic fatty liver disease in genetically diverse Collaborative Cross mice.

J Nutr Biochem. 2022 Nov;109:109108. doi: 10.1016/j.jnutbio.2022.109108. Epub 2022 Jul 17.

KLFDAPC: a supervised machine learning approach for spatial genetic structure analysis.

Brief Bioinform. 2022 Jul 18;23(4). doi: 10.1093/bib/bbac202.

Genetic analyses of lodging resistance and yield provide insights into post-Green-Revolution breeding in rice.

Plant Biotechnol J. 2021 Apr;19(4):814-829. doi: 10.1111/pbi.13509. Epub 2020 Dec 16.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

通过在滑动窗口上进行快速最近邻搜索来推断大型单核苷酸多态性（SNP）面板中缺失的基因型。

Inferring missing genotypes in large SNP panels using fast nearest-neighbor searches over sliding windows.

作者信息

Roberts Adam, McMillan Leonard, Wang Wei, Parker Joel, Rusyn Ivan, Threadgill David

机构信息

Department of Computer Science, University of North Carolina, Chapel Hill, NC 27599, USA.

出版信息

Bioinformatics. 2007 Jul 1;23(13):i401-7. doi: 10.1093/bioinformatics/btm220.

DOI:10.1093/bioinformatics/btm220

PMID:17646323

Abstract

MOTIVATION

RESULTS

AVAILABILITY

A free open source software package, NPUTE, is available at http://compgen.unc.edu/software, for non-commercial uses.

摘要

动机

结果

可用性

一个免费的开源软件包NPUTE可从http://compgen.unc.edu/software获取，供非商业使用。

通过在滑动窗口上进行快速最近邻搜索来推断大型单核苷酸多态性（SNP）面板中缺失的基因型。

Inferring missing genotypes in large SNP panels using fast nearest-neighbor searches over sliding windows.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

动机

结果

可用性

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

通过在滑动窗口上进行快速最近邻搜索来推断大型单核苷酸多态性（SNP）面板中缺失的基因型。

Inferring missing genotypes in large SNP panels using fast nearest-neighbor searches over sliding windows.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

动机

结果

可用性

相似文献

引用本文的文献