• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

序列普查方法的覆盖统计。

Coverage statistics for sequence census methods.

机构信息

Department of Mathematics, University of California, Berkeley, California, USA.

出版信息

BMC Bioinformatics. 2010 Aug 18;11:430. doi: 10.1186/1471-2105-11-430.

DOI:10.1186/1471-2105-11-430
PMID:20718980
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2940910/
Abstract

BACKGROUND

We study the statistical properties of fragment coverage in genome sequencing experiments. In an extension of the classic Lander-Waterman model, we consider the effect of the length distribution of fragments. We also introduce a coding of the shape of the coverage depth function as a tree and explain how this can be used to detect regions with anomalous coverage. This modeling perspective is especially germane to current high-throughput sequencing experiments, where both sample preparation protocols and sequencing technology particulars can affect fragment length distributions.

RESULTS

Under the mild assumptions that fragment start sites are Poisson distributed and successive fragment lengths are independent and identically distributed, we observe that, regardless of fragment length distribution, the fragments produced in a sequencing experiment can be viewed as resulting from a two-dimensional spatial Poisson process. We then study the successive jumps of the coverage function, and show that they can be encoded as a random tree that is approximately a Galton-Watson tree with generation-dependent geometric offspring distributions whose parameters can be computed.

CONCLUSIONS

We extend standard analyses of shotgun sequencing that focus on coverage statistics at individual sites, and provide a null model for detecting deviations from random coverage in high-throughput sequence census based experiments. Our approach leads to explicit determinations of the null distributions of certain test statistics, while for others it greatly simplifies the approximation of their null distributions by simulation. Our focus on fragments also leads to a new approach to visualizing sequencing data that is of independent interest.

摘要

背景

我们研究了基因组测序实验中片段覆盖度的统计特性。在经典的 Lander-Waterman 模型的扩展中,我们考虑了片段长度分布的影响。我们还引入了一种将覆盖深度函数形状编码为树的方法,并解释了如何使用这种方法来检测具有异常覆盖度的区域。这种建模视角尤其适用于当前的高通量测序实验,其中样品制备方案和测序技术细节都可能影响片段长度分布。

结果

在片段起始位点呈泊松分布且连续片段长度独立同分布的温和假设下,我们观察到,无论片段长度分布如何,测序实验中产生的片段都可以看作是二维空间泊松过程的结果。然后,我们研究了覆盖函数的连续跳跃,并表明它们可以编码为随机树,该树近似于具有世代相关几何后代分布的 Galton-Watson 树,其参数可以计算。

结论

我们扩展了专注于单个位点覆盖度统计的标准霰弹枪测序分析,并提供了一种基于高通量序列计数实验检测随机覆盖偏差的零模型。我们的方法导致了某些检验统计量的零分布的显式确定,而对于其他检验统计量,则通过模拟极大简化了它们的零分布的近似。我们对片段的关注也导致了一种新的可视化测序数据的方法,这是独立的兴趣点。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e9b2/2940910/d80fa2e72c4b/1471-2105-11-430-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e9b2/2940910/b02ef14e8c33/1471-2105-11-430-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e9b2/2940910/2053ac990ec8/1471-2105-11-430-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e9b2/2940910/4244ad58f52f/1471-2105-11-430-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e9b2/2940910/f8a36b698f80/1471-2105-11-430-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e9b2/2940910/df759186a53b/1471-2105-11-430-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e9b2/2940910/d80fa2e72c4b/1471-2105-11-430-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e9b2/2940910/b02ef14e8c33/1471-2105-11-430-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e9b2/2940910/2053ac990ec8/1471-2105-11-430-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e9b2/2940910/4244ad58f52f/1471-2105-11-430-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e9b2/2940910/f8a36b698f80/1471-2105-11-430-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e9b2/2940910/df759186a53b/1471-2105-11-430-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e9b2/2940910/d80fa2e72c4b/1471-2105-11-430-6.jpg

相似文献

1
Coverage statistics for sequence census methods.序列普查方法的覆盖统计。
BMC Bioinformatics. 2010 Aug 18;11:430. doi: 10.1186/1471-2105-11-430.
2
An Enumerative Combinatorics Model for Fragmentation Patterns in RNA Sequencing Provides Insights into Nonuniformity of the Expected Fragment Starting-Point and Coverage Profile.一种用于RNA测序中片段化模式的枚举组合模型为预期片段起始点和覆盖谱的非均匀性提供了见解。
J Comput Biol. 2017 Mar;24(3):200-212. doi: 10.1089/cmb.2016.0096. Epub 2016 Sep 23.
3
Occupancy modeling of coverage distribution for whole genome shotgun DNA sequencing.全基因组鸟枪法DNA测序覆盖度分布的占用率建模
Bull Math Biol. 2006 Jan;68(1):179-96. doi: 10.1007/s11538-005-9021-4. Epub 2006 Mar 24.
4
Fragmentation and Coverage Variation in Viral Metagenome Assemblies, and Their Effect in Diversity Calculations.病毒宏基因组组装中的碎片化和覆盖度变化,及其对多样性计算的影响。
Front Bioeng Biotechnol. 2015 Sep 17;3:141. doi: 10.3389/fbioe.2015.00141. eCollection 2015.
5
On contigs and coverage.
J Comput Biol. 2013 Jun;20(6):424-32. doi: 10.1089/cmb.2011.0243. Epub 2013 May 15.
6
Putative null distributions corresponding to tests of differential expression in the Golden Spike dataset are intensity dependent.与金标准数据集差异表达检验相对应的假定零分布是强度依赖性的。
BMC Genomics. 2007 Apr 19;8:105. doi: 10.1186/1471-2164-8-105.
7
Quantile-function based null distribution in resampling based multiple testing.基于重采样的多重检验中基于分位数函数的零分布。
Stat Appl Genet Mol Biol. 2006;5:Article14. doi: 10.2202/1544-6115.1199. Epub 2006 May 21.
8
Preparation of a phage DNA fragment library for whole genome shotgun sequencing.用于全基因组鸟枪法测序的噬菌体DNA片段文库的制备。
Methods Mol Biol. 2009;502:27-46. doi: 10.1007/978-1-60327-565-1_4.
9
Predicting the Number of Bases to Attain Sufficient Coverage in High-Throughput Sequencing Experiments.预测高通量测序实验中达到足够覆盖度所需的碱基数量。
J Comput Biol. 2020 Jul;27(7):1130-1143. doi: 10.1089/cmb.2019.0264. Epub 2019 Nov 15.
10
Extension of Lander-Waterman theory for sequencing filtered DNA libraries.用于对过滤后的DNA文库进行测序的兰德-沃特曼理论扩展
BMC Bioinformatics. 2005 Oct 10;6:245. doi: 10.1186/1471-2105-6-245.

引用本文的文献

1
Peak shape clustering reveals biological insights.峰形聚类揭示生物学见解。
BMC Bioinformatics. 2015 Oct 28;16:349. doi: 10.1186/s12859-015-0787-6.
2
Count ratio model reveals bias affecting NGS fold changes.计数比率模型揭示了影响NGS倍数变化的偏差。
Nucleic Acids Res. 2015 Nov 16;43(20):e136. doi: 10.1093/nar/gkv696. Epub 2015 Jul 8.
3
Microsatellite genotyping reveals a signature in breast cancer exomes.微卫星基因分型揭示了乳腺癌外显子组中的一种特征。

本文引用的文献

1
Shape-based peak identification for ChIP-Seq.基于形状的 ChIP-Seq 峰识别。
BMC Bioinformatics. 2011 Jan 12;12:15. doi: 10.1186/1471-2105-12-15.
2
Biases in Illumina transcriptome sequencing caused by random hexamer priming.Illumina 转录组测序中随机六聚体引物引起的偏倚。
Nucleic Acids Res. 2010 Jul;38(12):e131. doi: 10.1093/nar/gkq224. Epub 2010 Apr 14.
3
Sequence census methods for functional genomics.功能基因组学的序列普查方法。
Breast Cancer Res Treat. 2014 Jun;145(3):791-8. doi: 10.1007/s10549-014-2908-8. Epub 2014 May 17.
4
Spillover of pH1N1 to swine in Cameroon: an investigation of risk factors.甲型H1N1流感病毒在喀麦隆猪群中的传播:风险因素调查
BMC Vet Res. 2014 Mar 4;10:55. doi: 10.1186/1746-6148-10-55.
5
Quantifying uniformity of mapped reads.量化映射读取的均匀性。
Bioinformatics. 2012 Oct 15;28(20):2680-2. doi: 10.1093/bioinformatics/bts451. Epub 2012 Jul 18.
6
Shape-based peak identification for ChIP-Seq.基于形状的 ChIP-Seq 峰识别。
BMC Bioinformatics. 2011 Jan 12;12:15. doi: 10.1186/1471-2105-12-15.
Nat Methods. 2008 Jan;5(1):19-21. doi: 10.1038/nmeth1157. Epub 2007 Dec 19.
4
A geometric approach to tree shape statistics.
Syst Biol. 2006 Aug;55(4):652-61. doi: 10.1080/10635150600889617.
5
A general coverage theory for shotgun DNA sequencing.一种用于鸟枪法DNA测序的通用覆盖理论。
J Comput Biol. 2006 Jul-Aug;13(6):1177-96. doi: 10.1089/cmb.2006.13.1177.
6
Extension of Lander-Waterman theory for sequencing filtered DNA libraries.用于对过滤后的DNA文库进行测序的兰德-沃特曼理论扩展
BMC Bioinformatics. 2005 Oct 10;6:245. doi: 10.1186/1471-2105-6-245.
7
A whole-genome assembly of Drosophila.果蝇的全基因组组装
Science. 2000 Mar 24;287(5461):2196-204. doi: 10.1126/science.287.5461.2196.
8
Human whole-genome shotgun sequencing.人类全基因组鸟枪法测序
Genome Res. 1997 May;7(5):401-9. doi: 10.1101/gr.7.5.401.
9
Coverage processes in physical mapping by anchoring random clones.通过锚定随机克隆进行物理图谱绘制中的覆盖过程。
J Comput Biol. 1997 Spring;4(1):61-82. doi: 10.1089/cmb.1997.4.61.
10
Genomic mapping by fingerprinting random clones: a mathematical analysis.通过随机克隆指纹图谱进行基因组作图:数学分析
Genomics. 1988 Apr;2(3):231-9. doi: 10.1016/0888-7543(88)90007-9.