Suppr超能文献

基于密度的基因簇分箱,使用 GeneGrouper 推断功能或进化历史。

Density-based binning of gene clusters to infer function or evolutionary history using GeneGrouper.

机构信息

Department of Civil and Environmental Engineering, Northwestern University, Evanston, IL 60208, USA.

Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL 60208, USA.

出版信息

Bioinformatics. 2022 Jan 12;38(3):612-620. doi: 10.1093/bioinformatics/btab752.

Abstract

MOTIVATION

Identifying variant forms of gene clusters of interest in phylogenetically proximate and distant taxa can help to infer their evolutionary histories and functions. Conserved gene clusters may differ by only a few genes, but these small differences can in turn induce substantial phenotypes, such as by the formation of pseudogenes or insertions interrupting regulation. Particularly as microbial genomes and metagenomic assemblies become increasingly abundant, unsupervised grouping of similar, but not necessarily identical, gene clusters into consistent bins can provide a population-level understanding of their gene content variation and functional homology.

RESULTS

We developed GeneGrouper, a command-line tool that uses a density-based clustering method to group gene clusters into bins. GeneGrouper demonstrated high recall and precision in benchmarks for the detection of the 23-gene Salmonella enterica LT2 Pdu gene cluster and four-gene Pseudomonas aeruginosa PAO1 Mex gene cluster among 435 genomes spanning mixed taxa. In a subsequent application investigating the diversity and impact of gene-complete and -incomplete LT2 Pdu gene clusters in 1130 S.enterica genomes, GeneGrouper identified a novel, frequently occurring pduN pseudogene. When investigated in vivo, introduction of the pduN pseudogene negatively impacted microcompartment formation. We next demonstrated the versatility of GeneGrouper by clustering distant homologous gene clusters and variable gene clusters found in integrative and conjugative elements.

AVAILABILITY AND IMPLEMENTATION

GeneGrouper software and code are publicly available at https://pypi.org/project/GeneGrouper/.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

在系统发育上相近和遥远的分类群中识别感兴趣的基因簇的变体形式有助于推断它们的进化历史和功能。保守的基因簇可能只在少数基因上有所不同,但这些微小的差异反过来又会引起显著的表型,例如通过假基因的形成或中断调节的插入。特别是随着微生物基因组和宏基因组组装变得越来越丰富,将相似但不一定完全相同的基因簇未经监督地分组到一致的箱中,可以提供对其基因内容变异和功能同源性的群体水平理解。

结果

我们开发了 GeneGrouper,这是一种命令行工具,它使用基于密度的聚类方法将基因簇分组到箱中。GeneGrouper 在检测跨越混合分类群的 435 个基因组中的 23 基因沙门氏菌肠 LT2 Pdu 基因簇和 4 基因铜绿假单胞菌 PAO1 Mex 基因簇的 23 个基因沙门氏菌肠 LT2 Pdu 基因簇和 4 基因铜绿假单胞菌 PAO1 Mex 基因簇的检测基准测试中表现出高召回率和精度。在随后的一项应用中,我们调查了 1130 个 S.enterica 基因组中基因完整和不完整 LT2 Pdu 基因簇的多样性和影响,GeneGrouper 鉴定了一个新的、频繁出现的 pduN 假基因。当在体内研究时,pduN 假基因的引入对微区室形成产生负面影响。我们接下来通过聚类远缘同源基因簇和整合和共轭元件中发现的可变基因簇展示了 GeneGrouper 的多功能性。

可用性和实现

GeneGrouper 软件和代码可在 https://pypi.org/project/GeneGrouper/ 上公开获得。

补充信息

补充数据可在生物信息学在线获得。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验