Suppr超能文献

通过在滑动窗口上进行快速最近邻搜索来推断大型单核苷酸多态性(SNP)面板中缺失的基因型。

Inferring missing genotypes in large SNP panels using fast nearest-neighbor searches over sliding windows.

作者信息

Roberts Adam, McMillan Leonard, Wang Wei, Parker Joel, Rusyn Ivan, Threadgill David

机构信息

Department of Computer Science, University of North Carolina, Chapel Hill, NC 27599, USA.

出版信息

Bioinformatics. 2007 Jul 1;23(13):i401-7. doi: 10.1093/bioinformatics/btm220.

Abstract

MOTIVATION

Typical high-throughput genotyping techniques produce numerous missing calls that confound subsequent analyses, such as disease association studies. Common remedies for this problem include removing affected markers and/or samples or, otherwise, imputing the missing data. On small marker sets imputation is frequently based on a vote of the K-nearest-neighbor (KNN) haplotypes, but this technique is neither practical nor justifiable for large datasets.

RESULTS

We describe a data structure that supports efficient KNN queries over arbitrarily sized, sliding haplotype windows, and evaluate its use for genotype imputation. The performance of our method enables exhaustive exploration over all window sizes and known sites in large (150K, 8.3M) SNP panels. We also compare the accuracy and performance of our methods with competing imputation approaches.

AVAILABILITY

A free open source software package, NPUTE, is available at http://compgen.unc.edu/software, for non-commercial uses.

摘要

动机

典型的高通量基因分型技术会产生大量缺失值,这会干扰后续分析,如疾病关联研究。针对此问题的常见补救措施包括去除受影响的标记和/或样本,或者对缺失数据进行插补。在小标记集上,插补通常基于K近邻(KNN)单倍型的投票,但该技术对于大型数据集既不实用也不合理。

结果

我们描述了一种数据结构,它支持在任意大小的滑动单倍型窗口上进行高效的KNN查询,并评估其在基因型插补中的应用。我们方法的性能使得能够对大型(150K、830万)SNP面板中的所有窗口大小和已知位点进行详尽探索。我们还将我们方法的准确性和性能与竞争的插补方法进行了比较。

可用性

一个免费的开源软件包NPUTE可从http://compgen.unc.edu/software获取,供非商业使用。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验