Seqenv：通过文本挖掘将序列与环境相联系。

Seqenv: linking sequences to environments through text mining.

作者信息

Sinclair Lucas, Ijaz Umer Z, Jensen Lars Juhl, Coolen Marco J L, Gubry-Rangin Cecile, Chroňáková Alica, Oulas Anastasis, Pavloudi Christina, Schnetzer Julia, Weimann Aaron, Ijaz Ali, Eiler Alexander, Quince Christopher, Pafilis Evangelos

机构信息

Department of Ecology and Genetics, Limnology, Uppsala University, Uppsala, Sweden.

Infrastructure and Environment Research Division, School of Engineering, University of Glasgow, Glasgow, United Kingdom.

出版信息

PeerJ. 2016 Dec 20;4:e2690. doi: 10.7717/peerj.2690. eCollection 2016.

DOI:10.7717/peerj.2690

PMID:28028456

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5178346/

Abstract

Understanding the distribution of taxa and associated traits across different environments is one of the central questions in microbial ecology. High-throughput sequencing (HTS) studies are presently generating huge volumes of data to address this biogeographical topic. However, these studies are often focused on specific environment types or processes leading to the production of individual, unconnected datasets. The large amounts of legacy sequence data with associated metadata that exist can be harnessed to better place the genetic information found in these surveys into a wider environmental context. Here we introduce a software program, seqenv, to carry out precisely such a task. It automatically performs similarity searches of short sequences against the "nt" nucleotide database provided by NCBI and, out of every hit, extracts-if it is available-the textual metadata field. After collecting all the isolation sources from all the search results, we run a text mining algorithm to identify and parse words that are associated with the Environmental Ontology (EnvO) controlled vocabulary. This, in turn, enables us to determine both in which environments individual sequences or taxa have previously been observed and, by weighted summation of those results, to summarize complete samples. We present two demonstrative applications of seqenv to a survey of ammonia oxidizing archaea as well as to a plankton paleome dataset from the Black Sea. These demonstrate the ability of the tool to reveal novel patterns in HTS and its utility in the fields of environmental source tracking, paleontology, and studies of microbial biogeography. To install seqenv, go to: https://github.com/xapple/seqenv.

摘要

了解不同环境中分类群及其相关特征的分布是微生物生态学的核心问题之一。目前，高通量测序（HTS）研究正在生成大量数据以解决这一生物地理学课题。然而，这些研究往往聚焦于特定的环境类型或过程，从而产生了一个个相互独立的数据集。现有的大量带有相关元数据的遗留序列数据可用于将这些调查中发现的遗传信息更好地置于更广泛的环境背景中。在此，我们介绍一款软件程序seqenv，以精确执行此类任务。它会自动针对美国国立医学图书馆提供的“nt”核苷酸数据库对短序列进行相似性搜索，并从每次命中结果中提取（如果可用）文本元数据字段。在从所有搜索结果中收集到所有分离源后，我们运行一种文本挖掘算法来识别和解析与环境本体（EnvO）控制词汇相关的词汇。这进而使我们能够确定在哪些环境中曾观察到单个序列或分类群，并通过对这些结果进行加权求和来总结完整样本。我们展示了seqenv在氨氧化古菌调查以及黑海浮游生物古基因组数据集方面的两个示范性应用。这些应用展示了该工具揭示高通量测序中新模式的能力及其在环境源追踪、古生物学和微生物生物地理学研究领域的实用性。要安装seqenv，请访问：https://github.com/xapple/seqenv 。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/869c/5178346/f0b536b18715/peerj-04-2690-g001.jpg

相似文献

Seqenv: linking sequences to environments through text mining.Seqenv：通过文本挖掘将序列与环境相联系。

PeerJ. 2016 Dec 20;4:e2690. doi: 10.7717/peerj.2690. eCollection 2016.

Extending SEQenv: a taxa-centric approach to environmental annotations of 16S rDNA sequences.扩展SEQenv：一种以分类群为中心的16S rDNA序列环境注释方法。

PeerJ. 2017 Oct 10;5:e3827. doi: 10.7717/peerj.3827. eCollection 2017.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

MPTM: A tool for mining protein post-translational modifications from literature.MPTM：一种从文献中挖掘蛋白质翻译后修饰的工具。

J Bioinform Comput Biol. 2017 Oct;15(5):1740005. doi: 10.1142/S0219720017400054. Epub 2017 Sep 11.

Read-Split-Run: an improved bioinformatics pipeline for identification of genome-wide non-canonical spliced regions using RNA-Seq data.读取-分割-运行：一种利用RNA测序数据识别全基因组非经典剪接区域的改进型生物信息学流程。

BMC Genomics. 2016 Aug 22;17 Suppl 7(Suppl 7):503. doi: 10.1186/s12864-016-2896-7.

Sequencing data discovery with MetaSeek.利用 MetaSeek 进行测序数据发现。

Bioinformatics. 2019 Nov 1;35(22):4857-4859. doi: 10.1093/bioinformatics/btz499.

REHUNT: a reliable and open source package for restriction enzyme hunting.REHUNT：一个用于酶切位点搜索的可靠且开源的软件包。

BMC Bioinformatics. 2018 Aug 10;19(1):178. doi: 10.1186/s12859-018-2168-4.

pysradb: A Python package to query next-generation sequencing metadata and data from NCBI Sequence Read Archive.pysradb：一个用于查询来自NCBI序列读取存档库的下一代测序元数据和数据的Python包。

F1000Res. 2019 Apr 23;8:532. doi: 10.12688/f1000research.18676.1. eCollection 2019.

The environment ontology in 2016: bridging domains with increased scope, semantic density, and interoperation.2016年的环境本体：通过扩大范围、增加语义密度和实现互操作性来弥合各领域之间的差距。

J Biomed Semantics. 2016 Sep 23;7(1):57. doi: 10.1186/s13326-016-0097-6.

引用本文的文献

Darling (v2.0): Mining disease-related databases for the detection of biomedical entity associations.达林（v2.0）：挖掘疾病相关数据库以检测生物医学实体关联。

Comput Struct Biotechnol J. 2025 Jun 14;27:2626-2637. doi: 10.1016/j.csbj.2025.06.025. eCollection 2025.

Selective Pressure Influences Inter-Biome Dispersal in the Assembly of Saline Microbial Communities.选择压力影响盐生微生物群落组装过程中的生物群落间扩散。

Environ Microbiol. 2024 Dec;26(12):e70019. doi: 10.1111/1462-2920.70019.

Analysis of pit latrine microbiota reveals depth-related variation in composition, and key parameters and taxa associated with latrine fill-up rate.对坑式厕所微生物群的分析揭示了其组成中与深度相关的变化，以及与厕所填充率相关的关键参数和分类群。

Front Microbiol. 2022 Sep 23;13:960747. doi: 10.3389/fmicb.2022.960747. eCollection 2022.

Biological Microbial Interactions from Cooccurrence Networks in a High Mountain Lacustrine District.高山湖泊地区共生网络中的生物微生物相互作用。

mSphere. 2022 Jun 29;7(3):e0091821. doi: 10.1128/msphere.00918-21. Epub 2022 Jun 1.

PREGO: A Literature and Data-Mining Resource to Associate Microorganisms, Biological Processes, and Environment Types.PREGO：一个用于关联微生物、生物过程和环境类型的文献与数据挖掘资源。

Microorganisms. 2022 Jan 26;10(2):293. doi: 10.3390/microorganisms10020293.

Beyond Taxonomic Identification: Integration of Ecological Responses to a Soil Bacterial 16S rRNA Gene Database.超越分类鉴定：将生态响应整合到土壤细菌16S rRNA基因数据库中。

Front Microbiol. 2021 Jul 19;12:682886. doi: 10.3389/fmicb.2021.682886. eCollection 2021.

Hawaiian Fungal Amplicon Sequence Variants Reveal Otherwise Hidden Biogeography.夏威夷真菌扩增子序列变异揭示了隐藏的生物地理学。

Microb Ecol. 2022 Jan;83(1):48-57. doi: 10.1007/s00248-021-01730-x. Epub 2021 Mar 20.

Named entity linking of geospatial and host metadata in GenBank for advancing biomedical research.在GenBank中进行地理空间和宿主元数据的命名实体链接以推进生物医学研究。

Database (Oxford). 2017 Jan 1;2017:bax093. doi: 10.1093/database/bax093.

Microbial connectivity and sorting in a High Arctic watershed.高北极流域中的微生物连通性和分异。

ISME J. 2018 Dec;12(12):2988-3000. doi: 10.1038/s41396-018-0236-4. Epub 2018 Aug 7.

An automated identification and analysis of ontological terms in gastrointestinal diseases and nutrition-related literature provides useful insights.对胃肠道疾病和营养相关文献中的本体术语进行自动识别和分析可提供有用的见解。

PeerJ. 2018 Jul 26;6:e5047. doi: 10.7717/peerj.5047. eCollection 2018.

本文引用的文献

A high-precision rule-based extraction system for expanding geospatial metadata in GenBank records.一种用于扩展GenBank记录中地理空间元数据的基于规则的高精度提取系统。

J Am Med Inform Assoc. 2016 Sep;23(5):934-41. doi: 10.1093/jamia/ocv172. Epub 2016 Jan 17.

Metagenomic Sequencing Unravels Gene Fragments with Phylogenetic Signatures of O2-Tolerant NiFe Membrane-Bound Hydrogenases in Lacustrine Sediment.宏基因组测序揭示了湖相沉积物中具有耐氧镍铁膜结合氢化酶系统发育特征的基因片段。

Curr Microbiol. 2015 Aug;71(2):296-302. doi: 10.1007/s00284-015-0846-2. Epub 2015 Jun 5.

Bacterial diversity along a 2600 km river continuum.沿一条2600公里长河流连续体的细菌多样性。

Environ Microbiol. 2015 Dec;17(12):4994-5007. doi: 10.1111/1462-2920.12886. Epub 2015 Jun 11.

ENVIRONMENTS and EOL: identification of Environment Ontology terms in text and the annotation of the Encyclopedia of Life.环境与生命百科全书：文本中环境本体术语的识别及生命百科全书的注释

Bioinformatics. 2015 Jun 1;31(11):1872-4. doi: 10.1093/bioinformatics/btv045. Epub 2015 Jan 24.

Genomic standards consortium projects.基因组标准联盟项目。

Stand Genomic Sci. 2014 Feb 15;9(3):599-601. doi: 10.4056/sigs.5559680. eCollection 2014 Jun 15.

Can marine bacteria be recruited from freshwater sources and the air?海洋细菌能否从淡水来源和空气中被招募到？

ISME J. 2014 Dec;8(12):2423-30. doi: 10.1038/ismej.2014.89. Epub 2014 Jun 6.

The environment ontology: contextualising biological and biomedical entities.环境本体论：将生物和生物医学实体置于情境之中。

J Biomed Semantics. 2013 Dec 11;4(1):43. doi: 10.1186/2041-1480-4-43.

Evolution of the plankton paleome in the Black Sea from the Deglacial to Anthropocene.从冰消期到人类世黑海浮游古菌组的演化。

Proc Natl Acad Sci U S A. 2013 May 21;110(21):8609-14. doi: 10.1073/pnas.1219283110. Epub 2013 May 6.

Environmental microbiology through the lens of high-throughput DNA sequencing: synopsis of current platforms and bioinformatics approaches.通过高通量 DNA 测序观察环境微生物学：当前平台和生物信息学方法概述。

J Microbiol Methods. 2012 Oct;91(1):106-13. doi: 10.1016/j.mimet.2012.07.017. Epub 2012 Jul 28.

Niche specialization of terrestrial archaeal ammonia oxidizers.陆地古菌氨氧化菌的生态位特化。

Proc Natl Acad Sci U S A. 2011 Dec 27;108(52):21206-11. doi: 10.1073/pnas.1109000108. Epub 2011 Dec 8.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

Seqenv：通过文本挖掘将序列与环境相联系。

Seqenv: linking sequences to environments through text mining.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献