在从包含缺失数据的序列集中估计π的真实值时，平均加权核苷酸多样性比pixy更精确。

Average weighted nucleotide diversity is more precise than pixy in estimating the true value of π from sequence sets containing missing data.

作者信息

Konopiński Maciej K

机构信息

Institute of Nature Conservation Polish Academy of Sciences, Kraków, Poland.

出版信息

Mol Ecol Resour. 2023 Feb;23(2):348-354. doi: 10.1111/1755-0998.13707. Epub 2022 Sep 6.

DOI:10.1111/1755-0998.13707

PMID:36031871

Abstract

Nucleotide diversity remains an important statistic in population genetic/genomic studies. Although recent advances in massive sequencing make generating sequence data sets cheaper and faster, currently used technologies often introduce substantial amounts of missing nucleotides in their output. A novel method of estimating π from data sets containing missing data - pixy - has also recently been proposed. In this study, the pixy estimator, π , was compared to average weighted nucleotide diversity, π . The estimators were tested both on sequences simulated in fastsimcoal and real sequence sets. Both sets were modified by random insertion of missing nucleotides. Weighted nucleotide diversity performed better in all pairwise comparisons. It was characterized by a smaller error and a narrower distribution of the results. π tends to overestimate the nucleotide diversity when both the proportion of missing data and the level of variation is low. Of the two estimators, only π estimated the true nucleotide diversity in a part of the simulations. A simple formula for estimating π allows for easy integration of the estimator in packages such as pixy, which would allow obtaining more precise estimates of nucleotide diversity either in a sliding window or for discrete genomic regions.

摘要

核苷酸多样性仍然是群体遗传学/基因组学研究中的一项重要统计量。尽管大规模测序的最新进展使得生成序列数据集的成本更低、速度更快，但目前使用的技术在其输出中往往会引入大量缺失的核苷酸。最近还提出了一种从包含缺失数据的数据集中估计π的新方法——pixy。在本研究中，将pixy估计值π与平均加权核苷酸多样性π进行了比较。这两种估计方法在fastsimcoal模拟的序列和真实序列集上都进行了测试。这两组序列都通过随机插入缺失核苷酸进行了修改。在所有成对比较中，加权核苷酸多样性表现更好。其特点是误差较小，结果分布较窄。当缺失数据的比例和变异水平都较低时，π往往会高估核苷酸多样性。在这两种估计方法中，只有π在部分模拟中估计出了真实的核苷酸多样性。一个估计π的简单公式便于将该估计方法集成到pixy等软件包中，这将能够在滑动窗口或离散基因组区域中获得更精确的核苷酸多样性估计值。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

在从包含缺失数据的序列集中估计π的真实值时，平均加权核苷酸多样性比pixy更精确。

Average weighted nucleotide diversity is more precise than pixy in estimating the true value of π from sequence sets containing missing data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

在从包含缺失数据的序列集中估计π的真实值时，平均加权核苷酸多样性比pixy更精确。

Average weighted nucleotide diversity is more precise than pixy in estimating the true value of π from sequence sets containing missing data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献