基于可变长度马尔可夫链测序特征的无比对转录组和宏转录组比较

Alignment-free Transcriptomic and Metatranscriptomic Comparison Using Sequencing Signatures with Variable Length Markov Chains.

机构信息

Department of Automation, Xiamen University, Xiamen, Fujian, 361005 China.

Molecular and Computational Biology Program, University of Southern California, Los Angeles, California, CA 90089 USA.

出版信息

Sci Rep. 2016 Nov 23;6:37243. doi: 10.1038/srep37243.

DOI:10.1038/srep37243

PMID:27876823

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5120338/

Abstract

The comparison between microbial sequencing data is critical to understand the dynamics of microbial communities. The alignment-based tools analyzing metagenomic datasets require reference sequences and read alignments. The available alignment-free dissimilarity approaches model the background sequences with Fixed Order Markov Chain (FOMC) yielding promising results for the comparison of microbial communities. However, in FOMC, the number of parameters grows exponentially with the increase of the order of Markov Chain (MC). Under a fixed high order of MC, the parameters might not be accurately estimated owing to the limitation of sequencing depth. In our study, we investigate an alternative to FOMC to model background sequences with the data-driven Variable Length Markov Chain (VLMC) in metatranscriptomic data. The VLMC originally designed for long sequences was extended to apply to high-throughput sequencing reads and the strategies to estimate the corresponding parameters were developed. The flexible number of parameters in VLMC avoids estimating the vast number of parameters of high-order MC under limited sequencing depth. Different from the manual selection in FOMC, VLMC determines the MC order adaptively. Several beta diversity measures based on VLMC were applied to compare the bacterial RNA-Seq and metatranscriptomic datasets. Experiments show that VLMC outperforms FOMC to model the background sequences in transcriptomic and metatranscriptomic samples. A software pipeline is available at https://d2vlmc.codeplex.com.

摘要

微生物测序数据的比较对于理解微生物群落的动态至关重要。基于比对的工具分析宏基因组数据集需要参考序列和读对齐。现有的无比对差异方法使用固定阶马尔可夫链（FOMC）对背景序列进行建模，为微生物群落的比较提供了有前景的结果。然而，在 FOMC 中，随着马尔可夫链（MC）阶数的增加，参数数量呈指数增长。在固定的高阶 MC 下，由于测序深度的限制，参数可能无法准确估计。在我们的研究中，我们研究了一种替代 FOMC 的方法，即使用数据驱动的变长度马尔可夫链（VLMC）对宏转录组数据中的背景序列进行建模。最初为长序列设计的 VLMC 被扩展应用于高通量测序reads，并开发了估计相应参数的策略。VLMC 中灵活的参数数量避免了在有限的测序深度下估计高阶 MC 的大量参数。与 FOMC 中的手动选择不同，VLMC 自适应地确定 MC 阶数。几种基于 VLMC 的 beta 多样性度量被应用于比较细菌 RNA-Seq 和宏转录组数据集。实验表明，VLMC 在转录组和宏转录组样本中对背景序列的建模优于 FOMC。一个软件流程可在 https://d2vlmc.codeplex.com 获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/736e/5120338/82842bdbd8ec/srep37243-f1.jpg

相似文献

Alignment-free Transcriptomic and Metatranscriptomic Comparison Using Sequencing Signatures with Variable Length Markov Chains.

Sci Rep. 2016 Nov 23;6:37243. doi: 10.1038/srep37243.

Comparison of metatranscriptomic samples based on k-tuple frequencies.

PLoS One. 2014 Jan 2;9(1):e84348. doi: 10.1371/journal.pone.0084348. eCollection 2014.

A New Context Tree Inference Algorithm for Variable Length Markov Chain Model with Applications to Biological Sequence Analyses.

J Comput Biol. 2022 Aug;29(8):839-856. doi: 10.1089/cmb.2021.0604. Epub 2022 Apr 22.

Gene finding in metatranscriptomic sequences.

BMC Bioinformatics. 2014;15 Suppl 9(Suppl 9):S8. doi: 10.1186/1471-2105-15-S9-S8. Epub 2014 Sep 10.

Algorithms for variable length Markov chain modeling.

Bioinformatics. 2004 Mar 22;20(5):788-9. doi: 10.1093/bioinformatics/btg489. Epub 2004 Jan 29.

Optimal choice of word length when comparing two Markov sequences using a χ -statistic.

BMC Genomics. 2017 Oct 3;18(Suppl 6):732. doi: 10.1186/s12864-017-4020-z.

Comparison of assembly algorithms for improving rate of metatranscriptomic functional annotation.

Microbiome. 2014 Oct 28;2:39. doi: 10.1186/2049-2618-2-39. eCollection 2014.

ChimeRScope: a novel alignment-free algorithm for fusion transcript prediction using paired-end RNA-Seq data.

Nucleic Acids Res. 2017 Jul 27;45(13):e120. doi: 10.1093/nar/gkx315.

IDBA-MTP: A Hybrid Metatranscriptomic Assembler Based on Protein Information.

J Comput Biol. 2015 May;22(5):367-76. doi: 10.1089/cmb.2014.0139. Epub 2014 Dec 23.

Effect of k-tuple length on sample-comparison with high-throughput sequencing data.

Biochem Biophys Res Commun. 2016 Jan 22;469(4):1021-7. doi: 10.1016/j.bbrc.2015.11.094. Epub 2015 Dec 22.

引用本文的文献

SCRAPT: an iterative algorithm for clustering large 16S rRNA gene data sets.

Nucleic Acids Res. 2023 May 8;51(8):e46. doi: 10.1093/nar/gkad158.

Fast parallel construction of variable-length Markov chains.

BMC Bioinformatics. 2021 Oct 9;22(1):487. doi: 10.1186/s12859-021-04387-y.

Classifying the Lifestyle of Metagenomically-Derived Phages Sequences Using Alignment-Free Methods.

Front Microbiol. 2020 Nov 12;11:567769. doi: 10.3389/fmicb.2020.567769. eCollection 2020.

KmerGO: A Tool to Identify Group-Specific Sequences With -mers.

Front Microbiol. 2020 Aug 25;11:2067. doi: 10.3389/fmicb.2020.02067. eCollection 2020.

Tomato RNA-seq Data Mining Reveals the Taxonomic and Functional Diversity of Root-Associated Microbiota.

Microorganisms. 2019 Dec 24;8(1):38. doi: 10.3390/microorganisms8010038.

Alignment-Free Sequence Analysis and Applications.

Annu Rev Biomed Data Sci. 2018 Jul;1:93-114. doi: 10.1146/annurev-biodatasci-080917-013431. Epub 2018 Apr 25.

Reads Binning Improves Alignment-Free Metagenome Comparison.

Front Genet. 2019 Nov 21;10:1156. doi: 10.3389/fgene.2019.01156. eCollection 2019.

Afann: bias adjustment for alignment-free sequence comparison based on sequencing data using neural network regression.

Genome Biol. 2019 Dec 4;20(1):266. doi: 10.1186/s13059-019-1872-3.

Identifying Sequences for Microbial Communities Using Long -mer Sequence Signatures.

Front Microbiol. 2018 May 3;9:872. doi: 10.3389/fmicb.2018.00872. eCollection 2018.

MeShClust: an intelligent tool for clustering DNA sequences.

Nucleic Acids Res. 2018 Aug 21;46(14):e83. doi: 10.1093/nar/gky315.

本文引用的文献

MetaTrans: an open-source pipeline for metatranscriptomics.

Sci Rep. 2016 May 23;6:26447. doi: 10.1038/srep26447.

Fast and sensitive taxonomic classification for metagenomics with Kaiju.

Nat Commun. 2016 Apr 13;7:11257. doi: 10.1038/ncomms11257.

Inference of Markovian properties of molecular sequences from NGS data and applications to comparative genomics.

Bioinformatics. 2016 Apr 1;32(7):993-1000. doi: 10.1093/bioinformatics/btv395. Epub 2015 Jun 30.

Polyester: simulating RNA-seq datasets with differential transcript expression.

Bioinformatics. 2015 Sep 1;31(17):2778-84. doi: 10.1093/bioinformatics/btv272. Epub 2015 Apr 28.

CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers.

BMC Genomics. 2015 Mar 25;16(1):236. doi: 10.1186/s12864-015-1419-2.

Marine algae and land plants share conserved phytochrome signaling systems.

Proc Natl Acad Sci U S A. 2014 Nov 4;111(44):15827-32. doi: 10.1073/pnas.1416751111. Epub 2014 Sep 29.

Unraveling the stratification of an iron-oxidizing microbial mat by metatranscriptomics.

PLoS One. 2014 Jul 17;9(7):e102561. doi: 10.1371/journal.pone.0102561. eCollection 2014.

The Marine Microbial Eukaryote Transcriptome Sequencing Project (MMETSP): illuminating the functional diversity of eukaryotic life in the oceans through transcriptome sequencing.

PLoS Biol. 2014 Jun 24;12(6):e1001889. doi: 10.1371/journal.pbio.1001889. eCollection 2014 Jun.

Kraken: ultrafast metagenomic sequence classification using exact alignments.

Genome Biol. 2014 Mar 3;15(3):R46. doi: 10.1186/gb-2014-15-3-r46.

Comparison of metatranscriptomic samples based on k-tuple frequencies.

PLoS One. 2014 Jan 2;9(1):e84348. doi: 10.1371/journal.pone.0084348. eCollection 2014.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于可变长度马尔可夫链测序特征的无比对转录组和宏转录组比较

Alignment-free Transcriptomic and Metatranscriptomic Comparison Using Sequencing Signatures with Variable Length Markov Chains.

机构信息

Department of Automation, Xiamen University, Xiamen, Fujian, 361005 China.

Molecular and Computational Biology Program, University of Southern California, Los Angeles, California, CA 90089 USA.

出版信息

Sci Rep. 2016 Nov 23;6:37243. doi: 10.1038/srep37243.

DOI:10.1038/srep37243

PMID:27876823

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5120338/

Abstract

摘要

基于可变长度马尔可夫链测序特征的无比对转录组和宏转录组比较

Alignment-free Transcriptomic and Metatranscriptomic Comparison Using Sequencing Signatures with Variable Length Markov Chains.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

基于可变长度马尔可夫链测序特征的无比对转录组和宏转录组比较

Alignment-free Transcriptomic and Metatranscriptomic Comparison Using Sequencing Signatures with Variable Length Markov Chains.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献