Suppr超能文献

一种从高通量亚硫酸氢盐测序数据中识别甲基胞嘧啶的贝叶斯框架。

A Bayesian framework to identify methylcytosines from high-throughput bisulfite sequencing data.

作者信息

Xie Qing, Liu Qi, Mao Fengbiao, Cai Wanshi, Wu Honghu, You Mingcong, Wang Zhen, Chen Bingyu, Sun Zhong Sheng, Wu Jinyu

机构信息

Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, China.

Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, China.

出版信息

PLoS Comput Biol. 2014 Sep 25;10(9):e1003853. doi: 10.1371/journal.pcbi.1003853. eCollection 2014 Sep.

Abstract

High-throughput bisulfite sequencing technologies have provided a comprehensive and well-fitted way to investigate DNA methylation at single-base resolution. However, there are substantial bioinformatic challenges to distinguish precisely methylcytosines from unconverted cytosines based on bisulfite sequencing data. The challenges arise, at least in part, from cell heterozygosis caused by multicellular sequencing and the still limited number of statistical methods that are available for methylcytosine calling based on bisulfite sequencing data. Here, we present an algorithm, termed Bycom, a new Bayesian model that can perform methylcytosine calling with high accuracy. Bycom considers cell heterozygosis along with sequencing errors and bisulfite conversion efficiency to improve calling accuracy. Bycom performance was compared with the performance of Lister, the method most widely used to identify methylcytosines from bisulfite sequencing data. The results showed that the performance of Bycom was better than that of Lister for data with high methylation levels. Bycom also showed higher sensitivity and specificity for low methylation level samples (<1%) than Lister. A validation experiment based on reduced representation bisulfite sequencing data suggested that Bycom had a false positive rate of about 4% while maintaining an accuracy of close to 94%. This study demonstrated that Bycom had a low false calling rate at any methylation level and accurate methylcytosine calling at high methylation levels. Bycom will contribute significantly to studies aimed at recalibrating the methylation level of genomic regions based on the presence of methylcytosines.

摘要

高通量亚硫酸氢盐测序技术提供了一种在单碱基分辨率下研究DNA甲基化的全面且适配的方法。然而,基于亚硫酸氢盐测序数据精确区分甲基化胞嘧啶和未转化胞嘧啶存在重大的生物信息学挑战。这些挑战至少部分源于多细胞测序导致的细胞杂合性,以及基于亚硫酸氢盐测序数据进行甲基化胞嘧啶识别的可用统计方法数量仍然有限。在此,我们提出一种名为Bycom的算法,这是一种新的贝叶斯模型,能够高精度地进行甲基化胞嘧啶识别。Bycom在考虑测序错误和亚硫酸氢盐转化效率的同时,还兼顾了细胞杂合性,以提高识别准确性。将Bycom的性能与Lister(从亚硫酸氢盐测序数据中识别甲基化胞嘧啶最广泛使用的方法)的性能进行了比较。结果表明,对于高甲基化水平的数据,Bycom的性能优于Lister。对于低甲基化水平样本(<1%),Bycom也比Lister表现出更高的灵敏度和特异性。基于简化代表性亚硫酸氢盐测序数据的验证实验表明,Bycom的假阳性率约为4%,同时保持接近94%的准确率。这项研究表明,Bycom在任何甲基化水平下的误判率都很低,在高甲基化水平下能够准确识别甲基化胞嘧啶。Bycom将对基于甲基化胞嘧啶的存在重新校准基因组区域甲基化水平的研究做出重大贡献。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d280/4177668/e82452849f7e/pcbi.1003853.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验