Suppr超能文献

一种用于宏基因组学序列分析的开放阅读框(ORF)集组装方法。

An ORFome assembly approach to metagenomics sequences analysis.

作者信息

Ye Yuzhen, Tang Haixu

机构信息

School of Informatics, Indiana University, Bloomington, Indiana 47408, USA.

出版信息

Comput Syst Bioinformatics Conf. 2008;7:3-13.

Abstract

Metagenomics is an emerging methodology for the direct genomic analysis of a mixed community of uncultured microorganisms. The current analyses of metagenomics data largely rely on the computational tools originally designed for microbial genomics projects. The challenge of assembling metagenomic sequences arises mainly from the short reads and the high species complexity of the community. Alternatively, individual (short) reads will be searched directly against databases of known genes (or proteins) to identify homologous sequences. The latter approach may have low sensitivity and specificity in identifying homologous sequences, which may further bias the subsequent diversity analysis. In this paper, we present a novel approach to metagenomic data analysis, called Metagenomic ORFome Assembly (MetaORFA). The whole computational framework consists of three steps. Each read from a metagenomics project will first be annotated with putative open reading frames (ORFs) that likely encode proteins. Next, the predicted ORFs are assembled into a collection of peptides using an EULER assembly method. Finally, the assembled peptides (i.e., ORFome) are used for database searching of homologs and subsequent diversity analysis. We applied MetaORFA approach to several metagenomics datasets with low coverage short reads. The results show that MetaORFA can produce long peptides even when the sequence coverage of reads is extremely low. Hence, the ORFome assembly significantly increased the sensitivity of homology searching, and may potentially improve the diversity analysis of the metagenomic data. This improvement is especially useful for the metagenomic projects when the genome assembly does not work because of the low sequence coverage.

摘要

宏基因组学是一种用于对未培养微生物的混合群落进行直接基因组分析的新兴方法。目前对宏基因组学数据的分析很大程度上依赖于最初为微生物基因组学项目设计的计算工具。宏基因组序列组装的挑战主要源于短读长和群落中高物种复杂性。另一种方法是将单个(短)读长直接与已知基因(或蛋白质)数据库进行比对,以识别同源序列。后一种方法在识别同源序列时可能具有较低的灵敏度和特异性,这可能会进一步使后续的多样性分析产生偏差。在本文中,我们提出了一种新的宏基因组数据分析方法,称为宏基因组开放阅读框组装(MetaORFA)。整个计算框架由三个步骤组成。来自宏基因组学项目的每个读长首先会用可能编码蛋白质的推定开放阅读框(ORF)进行注释。接下来,使用欧拉组装方法将预测的ORF组装成肽段集合。最后,将组装好的肽段(即开放阅读框组)用于同源物的数据库搜索和后续的多样性分析。我们将MetaORFA方法应用于几个具有低覆盖度短读长的宏基因组学数据集。结果表明,即使读长的序列覆盖度极低,MetaORFA也能产生长肽段。因此,开放阅读框组装显著提高了同源性搜索的灵敏度,并可能潜在地改善宏基因组数据的多样性分析。当由于序列覆盖度低而无法进行基因组组装时,这种改进对于宏基因组学项目特别有用。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验