Leung Henry C M, Yiu Siu-Ming, Chin Francis Y L
Department of Computer Science, The University of Hong Kong , Hong Kong , Hong Kong.
J Comput Biol. 2015 May;22(5):367-76. doi: 10.1089/cmb.2014.0139. Epub 2014 Dec 23.
Metatranscriptomic analysis provides information on how a microbial community reacts to environmental changes. Using next-generation sequencing (NGS) technology, biologists can study the microbe community by sampling short reads from a mixture of mRNAs (metatranscriptomic data). As most microbial genome sequences are unknown, it would seem that de novo assembly of the mRNAs is needed. However, NGS reads are short and mRNAs share many similar regions and differ tremendously in abundance levels, making de novo assembly challenging. The existing assembler, IDBA-MT, designed specifically for the assembly of metatranscriptomic data and performs well only on high-expressed mRNAs. This article introduces IDBA-MTP, which adopts a novel approach to metatranscriptomic assembly that makes use of the fact that there is a database of millions of known protein sequences associated with mRNAs. How to effectively use the protein information is nontrivial given the size of the database and given that different mRNAs might lead to proteins with similar functions (because different amino acids might have similar characteristics). IDBA-MTP employs a similarity measure between mRNAs and protein sequences, dynamic programming techniques, and seed-and-extend heuristics to tackle the problem effectively and efficiently. Experimental results show that IDBA-MTP outperforms existing assemblers by reconstructing 14% more mRNAs.
宏转录组分析提供了有关微生物群落如何对环境变化做出反应的信息。利用下一代测序(NGS)技术,生物学家可以通过从混合的mRNA(宏转录组数据)中采样短读长来研究微生物群落。由于大多数微生物基因组序列未知,似乎需要对mRNA进行从头组装。然而,NGS读长较短,且mRNA有许多相似区域,丰度水平差异极大,这使得从头组装具有挑战性。现有的组装器IDBA-MT是专门为宏转录组数据组装设计的,且仅在高表达的mRNA上表现良好。本文介绍了IDBA-MTP,它采用了一种新颖的宏转录组组装方法,利用了存在数百万个与mRNA相关的已知蛋白质序列数据库这一事实。鉴于数据库的规模以及不同的mRNA可能导致具有相似功能的蛋白质(因为不同的氨基酸可能具有相似的特征),如何有效利用蛋白质信息并非易事。IDBA-MTP采用mRNA与蛋白质序列之间的相似性度量、动态规划技术以及种子扩展启发式算法来有效且高效地解决该问题。实验结果表明,IDBA-MTP通过多重建14%的mRNA而优于现有的组装器。