Suppr超能文献

kWIP:k-mer加权内积,一种遗传相似性的从头估计器。

kWIP: The k-mer weighted inner product, a de novo estimator of genetic similarity.

作者信息

Murray Kevin D, Webers Christfried, Ong Cheng Soon, Borevitz Justin, Warthmann Norman

机构信息

Research School of Biology, The Australian National University, Canberra, Australia.

Data61, CSIRO, Canberra, Australia.

出版信息

PLoS Comput Biol. 2017 Sep 5;13(9):e1005727. doi: 10.1371/journal.pcbi.1005727. eCollection 2017 Sep.

Abstract

Modern genomics techniques generate overwhelming quantities of data. Extracting population genetic variation demands computationally efficient methods to determine genetic relatedness between individuals (or "samples") in an unbiased manner, preferably de novo. Rapid estimation of genetic relatedness directly from sequencing data has the potential to overcome reference genome bias, and to verify that individuals belong to the correct genetic lineage before conclusions are drawn using mislabelled, or misidentified samples. We present the k-mer Weighted Inner Product (kWIP), an assembly-, and alignment-free estimator of genetic similarity. kWIP combines a probabilistic data structure with a novel metric, the weighted inner product (WIP), to efficiently calculate pairwise similarity between sequencing runs from their k-mer counts. It produces a distance matrix, which can then be further analysed and visualised. Our method does not require prior knowledge of the underlying genomes and applications include establishing sample identity and detecting mix-up, non-obvious genomic variation, and population structure. We show that kWIP can reconstruct the true relatedness between samples from simulated populations. By re-analysing several published datasets we show that our results are consistent with marker-based analyses. kWIP is written in C++, licensed under the GNU GPL, and is available from https://github.com/kdmurray91/kwip.

摘要

现代基因组学技术产生了海量数据。提取群体遗传变异需要计算效率高的方法,以便以无偏倚的方式确定个体(或“样本”)之间的遗传相关性,最好是从头开始确定。直接从测序数据中快速估计遗传相关性有可能克服参考基因组偏差,并在使用错误标记或错误识别的样本得出结论之前,验证个体是否属于正确的遗传谱系。我们提出了k-mer加权内积(kWIP),这是一种无需组装和比对的遗传相似性估计方法。kWIP将概率数据结构与一种新的度量——加权内积(WIP)相结合,从k-mer计数中高效计算测序运行之间的成对相似性。它生成一个距离矩阵,然后可以对其进行进一步分析和可视化。我们的方法不需要对基础基因组有先验知识,其应用包括确定样本身份、检测混淆、非明显的基因组变异和群体结构。我们表明,kWIP可以重建模拟群体中样本之间的真实相关性。通过重新分析几个已发表的数据集,我们表明我们的结果与基于标记的分析一致。kWIP用C++编写,遵循GNU GPL许可,可从https://github.com/kdmurray91/kwip获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7a69/5600398/ce9d8ac34560/pcbi.1005727.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验