Suppr超能文献

Seqenv:通过文本挖掘将序列与环境相联系。

Seqenv: linking sequences to environments through text mining.

作者信息

Sinclair Lucas, Ijaz Umer Z, Jensen Lars Juhl, Coolen Marco J L, Gubry-Rangin Cecile, Chroňáková Alica, Oulas Anastasis, Pavloudi Christina, Schnetzer Julia, Weimann Aaron, Ijaz Ali, Eiler Alexander, Quince Christopher, Pafilis Evangelos

机构信息

Department of Ecology and Genetics, Limnology, Uppsala University, Uppsala, Sweden.

Infrastructure and Environment Research Division, School of Engineering, University of Glasgow, Glasgow, United Kingdom.

出版信息

PeerJ. 2016 Dec 20;4:e2690. doi: 10.7717/peerj.2690. eCollection 2016.

Abstract

Understanding the distribution of taxa and associated traits across different environments is one of the central questions in microbial ecology. High-throughput sequencing (HTS) studies are presently generating huge volumes of data to address this biogeographical topic. However, these studies are often focused on specific environment types or processes leading to the production of individual, unconnected datasets. The large amounts of legacy sequence data with associated metadata that exist can be harnessed to better place the genetic information found in these surveys into a wider environmental context. Here we introduce a software program, seqenv, to carry out precisely such a task. It automatically performs similarity searches of short sequences against the "nt" nucleotide database provided by NCBI and, out of every hit, extracts-if it is available-the textual metadata field. After collecting all the isolation sources from all the search results, we run a text mining algorithm to identify and parse words that are associated with the Environmental Ontology (EnvO) controlled vocabulary. This, in turn, enables us to determine both in which environments individual sequences or taxa have previously been observed and, by weighted summation of those results, to summarize complete samples. We present two demonstrative applications of seqenv to a survey of ammonia oxidizing archaea as well as to a plankton paleome dataset from the Black Sea. These demonstrate the ability of the tool to reveal novel patterns in HTS and its utility in the fields of environmental source tracking, paleontology, and studies of microbial biogeography. To install seqenv, go to: https://github.com/xapple/seqenv.

摘要

了解不同环境中分类群及其相关特征的分布是微生物生态学的核心问题之一。目前,高通量测序(HTS)研究正在生成大量数据以解决这一生物地理学课题。然而,这些研究往往聚焦于特定的环境类型或过程,从而产生了一个个相互独立的数据集。现有的大量带有相关元数据的遗留序列数据可用于将这些调查中发现的遗传信息更好地置于更广泛的环境背景中。在此,我们介绍一款软件程序seqenv,以精确执行此类任务。它会自动针对美国国立医学图书馆提供的“nt”核苷酸数据库对短序列进行相似性搜索,并从每次命中结果中提取(如果可用)文本元数据字段。在从所有搜索结果中收集到所有分离源后,我们运行一种文本挖掘算法来识别和解析与环境本体(EnvO)控制词汇相关的词汇。这进而使我们能够确定在哪些环境中曾观察到单个序列或分类群,并通过对这些结果进行加权求和来总结完整样本。我们展示了seqenv在氨氧化古菌调查以及黑海浮游生物古基因组数据集方面的两个示范性应用。这些应用展示了该工具揭示高通量测序中新模式的能力及其在环境源追踪、古生物学和微生物生物地理学研究领域的实用性。要安装seqenv,请访问:https://github.com/xapple/seqenv

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/869c/5178346/f0b536b18715/peerj-04-2690-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验