在低覆盖度和宏基因组学环境中检测表观遗传模体。

Detecting epigenetic motifs in low coverage and metagenomics settings.

出版信息

BMC Bioinformatics. 2014;15 Suppl 9(Suppl 9):S16. doi: 10.1186/1471-2105-15-S9-S16. Epub 2014 Sep 10.

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4168715/

Abstract

BACKGROUND

It has recently become possible to rapidly and accurately detect epigenetic signatures in bacterial genomes using third generation sequencing data. Monitoring the speed at which a single polymerase inserts a base in the read strand enables one to infer whether a modification is present at that specific site on the template strand. These sites can be challenging to detect in the absence of high coverage and reliable reference genomes.

METHODS

Here we provide a new method for detecting epigenetic motifs in bacteria on datasets with low-coverage, with incomplete references, and with mixed samples (i.e. metagenomic data). Our approach treats motif inference as a kmer comparison problem. First, genomes (or contigs) are deconstructed into kmers. Then, native genome-wide distributions of interpulse durations (IPDs) for kmers are compared with corresponding whole genome amplified (WGA, modification free) IPD distributions using log likelihood ratios. Finally, kmers are ranked and greedily selected by iteratively correcting for sequences within a particular kmer's neighborhood.

CONCLUSIONS

Our method can detect multiple types of modifications, even at very low-coverage and in the presence of mixed genomes. Additionally, we are able to predict modified motifs when genomes with "neighbor" modified motifs exist within the sample. Lastly, we show that these motifs can provide an alternative source of information by which to cluster metagenomics contigs and that iterative refinement on these clustered contigs can further improve both sensitivity and specificity of motif detection.

AVAILABILITY

https://github.com/alibashir/EMMCKmer.

摘要

背景

最近，使用第三代测序数据可以快速准确地检测细菌基因组中的表观遗传特征。监测单个聚合酶在读取链中插入碱基的速度，可以推断在模板链上的特定位置是否存在修饰。在没有高覆盖率和可靠参考基因组的情况下，这些位点很难检测到。

方法

在这里，我们提供了一种新的方法，用于在低覆盖率、不完整的参考基因组和混合样本（即宏基因组数据）的数据集上检测细菌中的表观遗传基序。我们的方法将基序推断视为 kmer 比较问题。首先，将基因组（或 contigs）分解为 kmer。然后，使用对数似然比将 kmers 的内脉冲持续时间 (IPD) 的原始全基因组分布与相应的全基因组扩增 (WGA，无修饰) IPD 分布进行比较。最后，通过迭代校正特定 kmer 邻域内的序列对 kmers 进行排名和贪婪选择。

结论

我们的方法可以检测多种类型的修饰，即使在覆盖率非常低和存在混合基因组的情况下也是如此。此外，我们能够在样本中存在具有“邻居”修饰基序的基因组时预测修饰基序。最后，我们表明，这些基序可以通过聚类宏基因组 contigs 提供替代信息源，并且对这些聚类 contigs 进行迭代细化可以进一步提高基序检测的灵敏度和特异性。

可用性

https://github.com/alibashir/EMMCKmer。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dfd6/4168715/678dc9375a4c/1471-2105-15-S9-S16-1.jpg

相似文献

Detecting epigenetic motifs in low coverage and metagenomics settings.在低覆盖度和宏基因组学环境中检测表观遗传模体。

BMC Bioinformatics. 2014;15 Suppl 9(Suppl 9):S16. doi: 10.1186/1471-2105-15-S9-S16. Epub 2014 Sep 10.

COCACOLA: binning metagenomic contigs using sequence COmposition, read CoverAge, CO-alignment and paired-end read LinkAge.可口可乐：利用序列组成、读段覆盖度、共比对和双端读段连接对宏基因组重叠群进行分箱。

Bioinformatics. 2017 Mar 15;33(6):791-798. doi: 10.1093/bioinformatics/btw290.

CoMet: a workflow using contig coverage and composition for binning a metagenomic sample with high precision.CoMet：一种使用 contig 覆盖度和组成进行宏基因组样本高精度分箱的工作流程。

BMC Bioinformatics. 2017 Dec 28;18(Suppl 16):571. doi: 10.1186/s12859-017-1967-3.

Quantifying and comparing bacterial growth dynamics in multiple metagenomic samples.量化和比较多个宏基因组样本中的细菌生长动态。

Nat Methods. 2018 Dec;15(12):1041-1044. doi: 10.1038/s41592-018-0182-0. Epub 2018 Nov 12.

Analyzing genome coverage profiles with applications to quality control in metagenomics.分析基因组覆盖度图谱及其在宏基因组学质量控制中的应用。

Bioinformatics. 2013 May 15;29(10):1260-7. doi: 10.1093/bioinformatics/btt147. Epub 2013 Apr 14.

Repeat-aware modeling and correction of short read errors.重复感知建模和短读错误纠正。

BMC Bioinformatics. 2011 Feb 15;12 Suppl 1(Suppl 1):S52. doi: 10.1186/1471-2105-12-S1-S52.

Unsupervised statistical discovery of spaced motifs in prokaryotic genomes.原核生物基因组中间隔基序的无监督统计发现。

BMC Genomics. 2017 Jan 5;18(1):27. doi: 10.1186/s12864-016-3400-0.

Discovering multiple types of DNA methylation from bacteria and microbiome using nanopore sequencing.利用纳米孔测序技术从细菌和微生物组中发现多种类型的 DNA 甲基化。

Nat Methods. 2021 May;18(5):491-498. doi: 10.1038/s41592-021-01109-3. Epub 2021 Apr 5.

SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing.SPAdes：一种新的基因组组装算法及其在单细胞测序中的应用

J Comput Biol. 2012 May;19(5):455-77. doi: 10.1089/cmb.2012.0021. Epub 2012 Apr 16.

Improving the sensitivity of long read overlap detection using grouped short k-mer matches.利用分组短 k-mer 匹配提高长读重叠检测的灵敏度。

BMC Genomics. 2019 Apr 4;20(Suppl 2):190. doi: 10.1186/s12864-019-5475-x.

引用本文的文献

Epigenetic tumor heterogeneity in the era of single-cell profiling with nanopore sequencing.单细胞测序纳米孔技术时代的表观遗传肿瘤异质性。

Clin Epigenetics. 2022 Aug 27;14(1):107. doi: 10.1186/s13148-022-01323-6.

Deciphering bacterial epigenomes using modern sequencing technologies.利用现代测序技术破解细菌表观基因组。

Nat Rev Genet. 2019 Mar;20(3):157-172. doi: 10.1038/s41576-018-0081-3.

Isoform Sequencing and State-of-Art Applications for Unravelling Complexity of Plant Transcriptomes.异构体测序及解析植物转录组复杂性的最新应用

Genes (Basel). 2018 Jan 18;9(1):43. doi: 10.3390/genes9010043.

AgIn: measuring the landscape of CpG methylation of individual repetitive elements.AgIn：测量单个重复元件的CpG甲基化情况

Bioinformatics. 2016 Oct 1;32(19):2911-9. doi: 10.1093/bioinformatics/btw360. Epub 2016 Jun 17.

PacBio Sequencing and Its Applications.PacBio测序技术及其应用。

Genomics Proteomics Bioinformatics. 2015 Oct;13(5):278-89. doi: 10.1016/j.gpb.2015.08.002. Epub 2015 Nov 2.

本文引用的文献

Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.非杂交、基于长读长 SMRT 测序数据的完成微生物基因组组装。

Nat Methods. 2013 Jun;10(6):563-9. doi: 10.1038/nmeth.2474. Epub 2013 May 5.

Detecting DNA modifications from SMRT sequencing data by modeling sequence context dependence of polymerase kinetic.基于聚合酶动力学的序列上下文依赖性建模来检测 SMRT 测序数据中的 DNA 修饰。

PLoS Comput Biol. 2013;9(3):e1002935. doi: 10.1371/journal.pcbi.1002935. Epub 2013 Mar 14.

Comprehensive methylome characterization of Mycoplasma genitalium and Mycoplasma pneumoniae at single-base resolution.全面甲基组特征分析支原体属和肺炎支原体在单碱基分辨率。

PLoS Genet. 2013;9(1):e1003191. doi: 10.1371/journal.pgen.1003191. Epub 2013 Jan 3.

Genome-wide mapping of methylated adenine residues in pathogenic Escherichia coli using single-molecule real-time sequencing.利用单分子实时测序技术对致病性大肠杆菌中甲基化腺嘌呤残基进行全基因组图谱绘制。

Nat Biotechnol. 2012 Dec;30(12):1232-9. doi: 10.1038/nbt.2432. Epub 2012 Nov 8.

Modeling kinetic rate variation in third generation DNA sequencing data to detect putative modifications to DNA bases.对第三代 DNA 测序数据中的动力学速率变化进行建模，以检测 DNA 碱基的潜在修饰。

Genome Res. 2013 Jan;23(1):129-41. doi: 10.1101/gr.136739.111. Epub 2012 Oct 23.

The methylomes of six bacteria.六种细菌的甲基组图谱。

Nucleic Acids Res. 2012 Dec;40(22):11450-62. doi: 10.1093/nar/gks891. Epub 2012 Oct 2.

Finished bacterial genomes from shotgun sequence data.已完成的来自鸟枪法测序数据的细菌基因组。

Genome Res. 2012 Nov;22(11):2270-7. doi: 10.1101/gr.141515.112. Epub 2012 Jul 24.

MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads.MetaVelvet：Velvet 组装器的扩展，用于从短序列读取进行从头宏基因组组装。

Nucleic Acids Res. 2012 Nov 1;40(20):e155. doi: 10.1093/nar/gks678. Epub 2012 Jul 19.

Hybrid error correction and de novo assembly of single-molecule sequencing reads.单分子测序reads 的混合纠错与从头组装。

Nat Biotechnol. 2012 Jul 1;30(7):693-700. doi: 10.1038/nbt.2280.

A hybrid approach for the automated finishing of bacterial genomes.一种用于细菌基因组自动完成的混合方法。

Nat Biotechnol. 2012 Jul 1;30(7):701-707. doi: 10.1038/nbt.2288.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

在低覆盖度和宏基因组学环境中检测表观遗传模体。

Detecting epigenetic motifs in low coverage and metagenomics settings.

出版信息

BACKGROUND

METHODS

CONCLUSIONS

AVAILABILITY

背景

方法

结论

可用性

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献