Suppr超能文献

IDBA-MTP:一种基于蛋白质信息的混合宏转录组组装器。

IDBA-MTP: A Hybrid Metatranscriptomic Assembler Based on Protein Information.

作者信息

Leung Henry C M, Yiu Siu-Ming, Chin Francis Y L

机构信息

Department of Computer Science, The University of Hong Kong , Hong Kong , Hong Kong.

出版信息

J Comput Biol. 2015 May;22(5):367-76. doi: 10.1089/cmb.2014.0139. Epub 2014 Dec 23.

Abstract

Metatranscriptomic analysis provides information on how a microbial community reacts to environmental changes. Using next-generation sequencing (NGS) technology, biologists can study the microbe community by sampling short reads from a mixture of mRNAs (metatranscriptomic data). As most microbial genome sequences are unknown, it would seem that de novo assembly of the mRNAs is needed. However, NGS reads are short and mRNAs share many similar regions and differ tremendously in abundance levels, making de novo assembly challenging. The existing assembler, IDBA-MT, designed specifically for the assembly of metatranscriptomic data and performs well only on high-expressed mRNAs. This article introduces IDBA-MTP, which adopts a novel approach to metatranscriptomic assembly that makes use of the fact that there is a database of millions of known protein sequences associated with mRNAs. How to effectively use the protein information is nontrivial given the size of the database and given that different mRNAs might lead to proteins with similar functions (because different amino acids might have similar characteristics). IDBA-MTP employs a similarity measure between mRNAs and protein sequences, dynamic programming techniques, and seed-and-extend heuristics to tackle the problem effectively and efficiently. Experimental results show that IDBA-MTP outperforms existing assemblers by reconstructing 14% more mRNAs.

摘要

宏转录组分析提供了有关微生物群落如何对环境变化做出反应的信息。利用下一代测序(NGS)技术,生物学家可以通过从混合的mRNA(宏转录组数据)中采样短读长来研究微生物群落。由于大多数微生物基因组序列未知,似乎需要对mRNA进行从头组装。然而,NGS读长较短,且mRNA有许多相似区域,丰度水平差异极大,这使得从头组装具有挑战性。现有的组装器IDBA-MT是专门为宏转录组数据组装设计的,且仅在高表达的mRNA上表现良好。本文介绍了IDBA-MTP,它采用了一种新颖的宏转录组组装方法,利用了存在数百万个与mRNA相关的已知蛋白质序列数据库这一事实。鉴于数据库的规模以及不同的mRNA可能导致具有相似功能的蛋白质(因为不同的氨基酸可能具有相似的特征),如何有效利用蛋白质信息并非易事。IDBA-MTP采用mRNA与蛋白质序列之间的相似性度量、动态规划技术以及种子扩展启发式算法来有效且高效地解决该问题。实验结果表明,IDBA-MTP通过多重建14%的mRNA而优于现有的组装器。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验