Suppr超能文献

全基因组甲基化组二代测序研究中 CpG 覆盖度的估计。

Estimation of CpG coverage in whole methylome next-generation sequencing studies.

机构信息

Center for Biomarker Research and Personalized Medicine, School of Pharmacy, Virginia Commonwealth University, 1112 East Clay Street, P.O. Box 980533, Richmond, VA 23298, USA.

出版信息

BMC Bioinformatics. 2013 Feb 12;14:50. doi: 10.1186/1471-2105-14-50.

Abstract

BACKGROUND

Methylation studies are a promising complement to genetic studies of DNA sequence. However, detailed prior biological knowledge is typically lacking, so methylome-wide association studies (MWAS) will be critical to detect disease relevant sites. A cost-effective approach involves the next-generation sequencing (NGS) of single-end libraries created from samples that are enriched for methylated DNA fragments. A limitation of single-end libraries is that the fragment size distribution is not observed. This hampers several aspects of the data analysis such as the calculation of enrichment measures that are based on the number of fragments covering the CpGs.

RESULTS

We developed a non-parametric method that uses isolated CpGs to estimate sample-specific fragment size distributions from the empirical sequencing data. Through simulations we show that our method is highly accurate. While the traditional (extended) read count methods resulted in severely biased coverage estimates and introduces artificial inter-individual differences, through the use of the estimated fragment size distributions we could remove these biases almost entirely. Furthermore, we found correlations of 0.999 between coverage estimates obtained using fragment size distributions that were estimated with our method versus those that were "observed" in paired-end sequencing data.

CONCLUSIONS

We propose a non-parametric method for estimating fragment size distributions that is highly precise and can improve the analysis of cost-effective MWAS studies that sequence single-end libraries created from samples that are enriched for methylated DNA fragments.

摘要

背景

甲基化研究是对 DNA 序列遗传研究的一种很有前途的补充。然而,通常缺乏详细的先前生物学知识,因此全基因组甲基化关联研究(MWAS)对于检测与疾病相关的位点至关重要。一种具有成本效益的方法涉及从富含甲基化 DNA 片段的样本中创建的单端文库的下一代测序(NGS)。单端文库的一个局限性是无法观察到片段大小分布。这会阻碍数据分析的几个方面,例如基于覆盖 CpG 的片段数量计算富集度量的方法。

结果

我们开发了一种非参数方法,该方法使用分离的 CpG 从经验测序数据中估计样本特定的片段大小分布。通过模拟,我们表明我们的方法非常准确。虽然传统(扩展)读计数方法导致严重的覆盖估计偏差,并引入人为的个体间差异,但通过使用估计的片段大小分布,我们几乎可以完全消除这些偏差。此外,我们发现使用我们的方法估计的片段大小分布与在配对末端测序数据中“观察”到的片段大小分布之间的覆盖率估计值之间存在 0.999 的相关性。

结论

我们提出了一种用于估计片段大小分布的非参数方法,该方法非常精确,可以改进从富含甲基化 DNA 片段的样本中创建的单端文库进行测序的具有成本效益的 MWAS 研究的分析。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/612c/3599116/d2fb84120fad/1471-2105-14-50-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验