使用混合序列估计多态性位点的信息值。

Estimating the information value of polymorphic sites using pooled sequences.

作者信息

Malde Ketil

出版信息

BMC Genomics. 2014;15 Suppl 6(Suppl 6):S20. doi: 10.1186/1471-2164-15-S6-S20. Epub 2014 Oct 17.

DOI:10.1186/1471-2164-15-S6-S20

PMID:25571927

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4239578/

Abstract

BACKGROUND

High-throughput sequencing is a cost effective method for identifying genetic variation, and it is currently in use on a large scale across the field of biology, including ecology and population genetics. Correctly identifying variable sites and allele frequencies from sequencing data remains challenging, in large part due to artifacts and biases inherent in the sequencing process. Selecting variants that are diagnostic is commonly done using diversity statistics like FST, but these measures are not ideal for the task.

RESULTS

Here, we develop a method that directly calculates the expected amount of information gained from observing each variant site. We then develop and implement a conservative estimator that takes into account uncertainity introduced by sampling bias and sequencing error. This estimator is applied to simulated and real sequencing data, and we discuss how it performs compared to the commonly used existing methods for identifying diagnostic polymorphisms.

CONCLUSION

The expected information content gives an easy to interpret measure for the usefulness of variant sites. The results show that we achieve a clear separation between true variants and noise, allowing us to select candidate sites with a high degree of confidence.

摘要

背景

高通量测序是一种识别基因变异的经济高效方法，目前在包括生态学和群体遗传学在内的整个生物学领域大规模应用。从测序数据中正确识别可变位点和等位基因频率仍然具有挑战性，这在很大程度上是由于测序过程中固有的假象和偏差。通常使用诸如FST等多样性统计量来选择具有诊断性的变异，但这些方法并不理想。

结果

在此，我们开发了一种直接计算从观察每个变异位点获得的预期信息量的方法。然后，我们开发并实施了一种保守估计器，该估计器考虑了抽样偏差和测序错误引入的不确定性。该估计器应用于模拟和真实测序数据，并且我们讨论了与常用的现有诊断多态性识别方法相比它的表现如何。

结论

预期信息含量为变异位点的有用性提供了一种易于解释的度量。结果表明，我们在真实变异和噪声之间实现了清晰的区分，使我们能够高度自信地选择候选位点。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/185a/4239578/6f72d18be10c/1471-2164-15-S6-S20-1.jpg

相似文献

Estimating the information value of polymorphic sites using pooled sequences.

BMC Genomics. 2014;15 Suppl 6(Suppl 6):S20. doi: 10.1186/1471-2164-15-S6-S20. Epub 2014 Oct 17.

Accurately identifying low-allelic fraction variants in single samples with next-generation sequencing: applications in tumor subclone resolution.

Hum Mutat. 2013 Oct;34(10):1432-8. doi: 10.1002/humu.22365. Epub 2013 Jul 11.

Population genomic analysis of model and nonmodel organisms using sequenced RAD tags.

Methods Mol Biol. 2012;888:235-60. doi: 10.1007/978-1-61779-870-2_14.

CASCAD: a database of annotated candidate single nucleotide polymorphisms associated with expressed sequences.

BMC Genomics. 2005 Jan 27;6:10. doi: 10.1186/1471-2164-6-10.

Informative Bayesian Model Selection: a method for identifying interactions in genome-wide data.

Mol Biosyst. 2014 Oct;10(10):2654-62. doi: 10.1039/c4mb00123k.

Estimating population haplotype frequencies from pooled SNP data using incomplete database information.

Bioinformatics. 2009 Dec 15;25(24):3296-302. doi: 10.1093/bioinformatics/btp584. Epub 2009 Oct 27.

mit-o-matic: a comprehensive computational pipeline for clinical evaluation of mitochondrial variations from next-generation sequencing datasets.

Hum Mutat. 2015 Apr;36(4):419-24. doi: 10.1002/humu.22767.

Comparative view of in silico DNA sequencing analysis tools.

Methods Mol Biol. 2011;760:207-21. doi: 10.1007/978-1-61779-176-5_13.

Application of SNPs for population genetics of nonmodel organisms: new opportunities and challenges.

Mol Ecol Resour. 2011 Mar;11 Suppl 1:123-36. doi: 10.1111/j.1755-0998.2010.02943.x. Epub 2011 Jan 10.

A kinetic model-based algorithm to classify NGS short reads by their allele origin.

J Biomed Inform. 2015 Feb;53:121-7. doi: 10.1016/j.jbi.2014.10.001. Epub 2014 Oct 12.

引用本文的文献

Genomic DNA extraction optimization and validation for genome sequencing using the marine gastropod Kellet's whelk.

PeerJ. 2023 Dec 6;11:e16510. doi: 10.7717/peerj.16510. eCollection 2023.

Whole genome resequencing reveals diagnostic markers for investigating global migration and hybridization between minke whale species.

BMC Genomics. 2017 Jan 13;18(1):76. doi: 10.1186/s12864-016-3416-5.

本文引用的文献

Simulating a population genomics data set using FlowSim.

BMC Res Notes. 2014 Jan 31;7:68. doi: 10.1186/1756-0500-7-68.

Quantifying population genetic differentiation from next-generation sequencing data.

Genetics. 2013 Nov;195(3):979-92. doi: 10.1534/genetics.113.154740. Epub 2013 Aug 26.

Next generation sequencing has lower sequence coverage and poorer SNP-detection capability in the regulatory regions.

Sci Rep. 2011;1:55. doi: 10.1038/srep00055. Epub 2011 Aug 5.

vipR: variant identification in pooled DNA using R.

Bioinformatics. 2011 Jul 1;27(13):i77-84. doi: 10.1093/bioinformatics/btr205.

Generic genetic differences between farmed and wild Atlantic salmon identified from a 7K SNP-chip.

Mol Ecol Resour. 2011 Mar;11 Suppl 1:247-53. doi: 10.1111/j.1755-0998.2010.02959.x.

PoPoolation: a toolbox for population genetic analysis of next generation sequencing data from pooled individuals.

PLoS One. 2011 Jan 6;6(1):e15925. doi: 10.1371/journal.pone.0015925.

To pool, or not to pool?

Genetics. 2010 Sep;186(1):41-3. doi: 10.1534/genetics.110.121012.

Characteristics of 454 pyrosequencing data--enabling realistic simulation with flowsim.

Bioinformatics. 2010 Sep 15;26(18):i420-5. doi: 10.1093/bioinformatics/btq365.

The next generation of molecular markers from massively parallel sequencing of pooled DNA samples.

Genetics. 2010 Sep;186(1):207-18. doi: 10.1534/genetics.110.114397. Epub 2010 May 10.

Accurate detection and genotyping of SNPs utilizing population sequencing data.

Genome Res. 2010 Apr;20(4):537-45. doi: 10.1101/gr.100040.109. Epub 2010 Feb 11.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用混合序列估计多态性位点的信息值。

Estimating the information value of polymorphic sites using pooled sequences.

作者信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献