HoSeIn：一种整合来自宏基因组和宏转录组序列数据集的各种同源性搜索结果的工作流程。

HoSeIn: A Workflow for Integrating Various Homology Search Results from Metagenomic and Metatranscriptomic Sequence Datasets.

作者信息

Rozadilla Gaston, Clemente Jorgelina Moreiras, McCarthy Christina B

机构信息

Centro Regional de Estudios Genómicos, Facultad de Ciencias Exactas, Universidad Nacional de La Plata, La Plata, Argentina.

Departamento de Informática y Tecnología, Universidad Nacional del Noroeste de la Provincia de Buenos Aires, Pergamino, Buenos Aires, Argentina.

出版信息

Bio Protoc. 2020 Jul 20;10(14):e3679. doi: 10.21769/BioProtoc.3679.

DOI:10.21769/BioProtoc.3679

PMID:33659350

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7842381/

Abstract

Data generated by metagenomic and metatranscriptomic experiments is both enormous and inherently noisy. When using taxonomy-dependent alignment-based methods to classify and label reads, the first step consists in performing homology searches against sequence databases. To obtain the most information from the samples, nucleotide sequences are usually compared to various databases (nucleotide and protein) using local sequence aligners such as BLASTN and BLASTX. Nevertheless, the analysis and integration of these results can be problematic because the outputs from these searches usually show inconsistencies, which can be notorious when working with RNA-seq. Moreover, and to the best of our knowledge, existing tools do not criss-cross and integrate information from the different homology searches, but provide the results of each analysis separately. We developed the HoSeIn workflow to intersect the information from these homology searches, and then determine the taxonomic and functional profile of the sample using this integrated information. The workflow is based on the assumption that the sequences that correspond to a certain taxon are composed of: sequences that were assigned to the same taxon by both homology searches; sequences that were assigned to that taxon by one of the homology searches but returned no hits in the other one.

摘要

宏基因组学和宏转录组学实验产生的数据量巨大且本质上存在噪声。当使用基于分类学的比对方法对 reads 进行分类和标记时，第一步是针对序列数据库进行同源性搜索。为了从样本中获取最多信息，核苷酸序列通常使用诸如 BLASTN 和 BLASTX 等局部序列比对工具与各种数据库（核苷酸和蛋白质）进行比较。然而，这些结果的分析和整合可能存在问题，因为这些搜索的输出通常显示不一致，在处理 RNA-seq 时这可能很明显。此外，据我们所知，现有工具不会交叉和整合来自不同同源性搜索的信息，而是分别提供每个分析的结果。我们开发了 HoSeIn 工作流程来交叉这些同源性搜索的信息，然后使用这些整合信息确定样本的分类学和功能概况。该工作流程基于这样的假设，即对应于某个分类单元的序列由以下部分组成：在两个同源性搜索中都被分配到同一分类单元的序列；在其中一个同源性搜索中被分配到该分类单元但在另一个搜索中未命中的序列。

相似文献

HoSeIn: A Workflow for Integrating Various Homology Search Results from Metagenomic and Metatranscriptomic Sequence Datasets.HoSeIn：一种整合来自宏基因组和宏转录组序列数据集的各种同源性搜索结果的工作流程。

Bio Protoc. 2020 Jul 20;10(14):e3679. doi: 10.21769/BioProtoc.3679.

COGNIZER: A Framework for Functional Annotation of Metagenomic Datasets.认知器：宏基因组数据集功能注释框架

PLoS One. 2015 Nov 11;10(11):e0142102. doi: 10.1371/journal.pone.0142102. eCollection 2015.

GHOSTX: A Fast Sequence Homology Search Tool for Functional Annotation of Metagenomic Data.GHOSTX：一种用于宏基因组数据功能注释的快速序列同源性搜索工具。

Methods Mol Biol. 2017;1611:15-25. doi: 10.1007/978-1-4939-7015-5_2.

A sensitive short read homology search tool for paired-end read sequencing data.一种用于双端读段测序数据的灵敏短读段同源性搜索工具。

BMC Bioinformatics. 2017 Oct 16;18(Suppl 12):414. doi: 10.1186/s12859-017-1826-2.

CLAST: CUDA implemented large-scale alignment search tool.CLAST：基于CUDA实现的大规模比对搜索工具。

BMC Bioinformatics. 2014 Dec 11;15(1):406. doi: 10.1186/s12859-014-0406-y.

Cataloguing the taxonomic origins of sequences from a heterogeneous sample using phylogenomics: applications in adventitious agent detection.利用系统发育基因组学对异质样本中序列的分类学起源进行编目：在检测外来因子中的应用。

PDA J Pharm Sci Technol. 2014 Nov-Dec;68(6):602-18. doi: 10.5731/pdajpst.2014.01023.

Comparison of metatranscriptomic samples based on k-tuple frequencies.基于k元组频率的宏转录组样本比较。

PLoS One. 2014 Jan 2;9(1):e84348. doi: 10.1371/journal.pone.0084348. eCollection 2014.

Comparison of assembly algorithms for improving rate of metatranscriptomic functional annotation.比较提高宏转录组功能注释率的组装算法。

Microbiome. 2014 Oct 28;2:39. doi: 10.1186/2049-2618-2-39. eCollection 2014.

Comparative study of sequence aligners for detecting antibiotic resistance in bacterial metagenomes.用于检测细菌宏基因组中抗生素抗性的序列比对工具的比较研究

Lett Appl Microbiol. 2018 Mar;66(3):162-168. doi: 10.1111/lam.12842. Epub 2018 Feb 1.

Fragmentation and Coverage Variation in Viral Metagenome Assemblies, and Their Effect in Diversity Calculations.病毒宏基因组组装中的碎片化和覆盖度变化，及其对多样性计算的影响。

Front Bioeng Biotechnol. 2015 Sep 17;3:141. doi: 10.3389/fbioe.2015.00141. eCollection 2015.

本文引用的文献

Shotgun metagenome data of a defined mock community using Oxford Nanopore, PacBio and Illumina technologies.使用 Oxford Nanopore、PacBio 和 Illumina 技术对一个定义的模拟群落进行的 shotgun 宏基因组数据。

Sci Data. 2019 Nov 26;6(1):285. doi: 10.1038/s41597-019-0287-z.

Advances and Challenges in Metatranscriptomic Analysis.宏转录组学分析的进展与挑战

Front Genet. 2019 Sep 25;10:904. doi: 10.3389/fgene.2019.00904. eCollection 2019.

CDD/SPARCLE: functional classification of proteins via subfamily domain architectures.CDD/SPARCLE：通过亚家族结构域架构对蛋白质进行功能分类

Nucleic Acids Res. 2017 Jan 4;45(D1):D200-D203. doi: 10.1093/nar/gkw1129. Epub 2016 Nov 29.

InterPro in 2017-beyond protein family and domain annotations.2017年的InterPro——超越蛋白质家族和结构域注释

Nucleic Acids Res. 2017 Jan 4;45(D1):D190-D199. doi: 10.1093/nar/gkw1107. Epub 2016 Nov 29.

Metagenomics, Metatranscriptomics, and Metabolomics Approaches for Microbiome Analysis.用于微生物组分析的宏基因组学、宏转录组学和代谢组学方法。

Evol Bioinform Online. 2016 May 12;12(Suppl 1):5-16. doi: 10.4137/EBO.S36436. eCollection 2016.

The vocabulary of microbiome research: a proposal.微生物组研究词汇：建议。

Microbiome. 2015 Jul 30;3:31. doi: 10.1186/s40168-015-0094-5. eCollection 2015.

Metatranscriptomic Analysis of Larval Guts from Field-Collected and Laboratory-Reared Spodoptera frugiperda from the South American Subtropical Region.对来自南美亚热带地区野外采集和实验室饲养的草地贪夜蛾幼虫肠道进行宏转录组分析。

Genome Announc. 2015 Jul 16;3(4):e00777-15. doi: 10.1128/genomeA.00777-15.

Gene Ontology Consortium: going forward.基因本体论联盟：展望未来。

Nucleic Acids Res. 2015 Jan;43(Database issue):D1049-56. doi: 10.1093/nar/gku1179. Epub 2014 Nov 26.

Fast and sensitive protein alignment using DIAMOND.使用 DIAMOND 进行快速灵敏的蛋白质比对。

Nat Methods. 2015 Jan;12(1):59-60. doi: 10.1038/nmeth.3176. Epub 2014 Nov 17.

The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST).SEED 与利用子系统技术进行快速微生物基因组注释（RAST）。

Nucleic Acids Res. 2014 Jan;42(Database issue):D206-14. doi: 10.1093/nar/gkt1226. Epub 2013 Nov 29.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验