• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

CALDERA:用于细菌 GWAS 的所有显著 de Bruijn 子图的发现。

CALDERA: finding all significant de Bruijn subgraphs for bacterial GWAS.

机构信息

Pendulum Therapeutics, Inc., San Francisco, CA 94107, USA.

European Bioinformatics Institute, Cambridge CB10 1SD, UK.

出版信息

Bioinformatics. 2022 Jun 24;38(Suppl 1):i36-i44. doi: 10.1093/bioinformatics/btac238.

DOI:10.1093/bioinformatics/btac238
PMID:35758804
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9235473/
Abstract

MOTIVATION

Genome-wide association studies (GWAS), aiming to find genetic variants associated with a trait, have widely been used on bacteria to identify genetic determinants of drug resistance or hypervirulence. Recent bacterial GWAS methods usually rely on k-mers, whose presence in a genome can denote variants ranging from single-nucleotide polymorphisms to mobile genetic elements. This approach does not require a reference genome, making it easier to account for accessory genes. However, a same gene can exist in slightly different versions across different strains, leading to diluted effects.

RESULTS

Here, we overcome this issue by testing covariates built from closed connected subgraphs (CCSs) of the de Bruijn graph defined over genomic k-mers. These covariates capture polymorphic genes as a single entity, improving k-mer-based GWAS both in terms of power and interpretability. However, a method naively testing all possible subgraphs would be powerless due to multiple testing corrections, and the mere exploration of these subgraphs would quickly become computationally intractable. The concept of testable hypothesis has successfully been used to address both problems in similar contexts. We leverage this concept to test all CCSs by proposing a novel enumeration scheme for these objects which fully exploits the pruning opportunity offered by testability, resulting in drastic improvements in computational efficiency. Our method integrates with existing visual tools to facilitate interpretation.

AVAILABILITY AND IMPLEMENTATION

We provide an implementation of our method, as well as code to reproduce all results at https://github.com/HectorRDB/Caldera_ISMB.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

全基因组关联研究(GWAS)旨在寻找与性状相关的遗传变异,已广泛应用于细菌,以鉴定耐药性或超强毒力的遗传决定因素。最近的细菌 GWAS 方法通常依赖于 k-mer,基因组中 k-mer 的存在可以表示从单核苷酸多态性到移动遗传元件的变异。这种方法不需要参考基因组,因此更容易解释辅助基因。然而,同一个基因在不同菌株中可能存在略有不同的版本,导致效应稀释。

结果

在这里,我们通过测试基于基因组 k-mer 的 de Bruijn 图定义的闭连接子图(CCS)构建的协变量来克服这个问题。这些协变量将多态性基因作为一个整体进行捕获,提高了基于 k-mer 的 GWAS 的功效和可解释性。然而,由于多重测试校正,一种盲目测试所有可能子图的方法将无能为力,而仅仅探索这些子图将很快变得计算上不可行。可测试假设的概念已成功用于解决类似背景下的这两个问题。我们利用这个概念通过提出一种新的枚举方案来测试所有的 CCS 来解决这个问题,这种方案充分利用了可测试性提供的修剪机会,从而大大提高了计算效率。我们的方法与现有的可视化工具集成,以方便解释。

可用性和实现

我们提供了我们方法的实现,以及在 https://github.com/HectorRDB/Caldera_ISMB 上重现所有结果的代码。

补充信息

补充数据可在生物信息学在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c12e/9235473/ec851c4b2068/btac238f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c12e/9235473/62d2ab04d053/btac238f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c12e/9235473/13d6113dabcb/btac238f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c12e/9235473/f4d05435af82/btac238f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c12e/9235473/ec851c4b2068/btac238f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c12e/9235473/62d2ab04d053/btac238f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c12e/9235473/13d6113dabcb/btac238f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c12e/9235473/f4d05435af82/btac238f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c12e/9235473/ec851c4b2068/btac238f4.jpg

相似文献

1
CALDERA: finding all significant de Bruijn subgraphs for bacterial GWAS.CALDERA:用于细菌 GWAS 的所有显著 de Bruijn 子图的发现。
Bioinformatics. 2022 Jun 24;38(Suppl 1):i36-i44. doi: 10.1093/bioinformatics/btac238.
2
A fast and agnostic method for bacterial genome-wide association studies: Bridging the gap between k-mers and genetic events.一种快速且无偏倚的细菌全基因组关联研究方法:弥合 k- mers 与遗传事件之间的差距。
PLoS Genet. 2018 Nov 12;14(11):e1007758. doi: 10.1371/journal.pgen.1007758. eCollection 2018 Nov.
3
A space and time-efficient index for the compacted colored de Bruijn graph.一种用于压缩彩色 de Bruijn 图的空间和时间高效索引。
Bioinformatics. 2018 Jul 1;34(13):i169-i177. doi: 10.1093/bioinformatics/bty292.
4
Succinct colored de Bruijn graphs.简明彩色 de Bruijn 图。
Bioinformatics. 2017 Oct 15;33(20):3181-3187. doi: 10.1093/bioinformatics/btx067.
5
Toward perfect reads: self-correction of short reads via mapping on de Bruijn graphs.迈向完美读段:通过在 De Bruijn 图上进行映射来自我纠正短读段。
Bioinformatics. 2020 Mar 1;36(5):1374-1381. doi: 10.1093/bioinformatics/btz102.
6
Reduced ambiguity and improved interpretability of bacterial genome-wide associations using gene-cluster-centric -mers.使用基于基因簇中心的 -mers 减少细菌全基因组关联的歧义性并提高可解释性。
Microb Genom. 2023 Nov;9(11). doi: 10.1099/mgen.0.001129.
7
Metagenome SNP calling via read-colored de Bruijn graphs.通过读取颜色化的德布鲁因图进行宏基因组单核苷酸多态性(SNP)检测
Bioinformatics. 2021 Apr 1;36(22-23):5275-5281. doi: 10.1093/bioinformatics/btaa081.
8
Integrate multiple traits to detect novel trait-gene association using GWAS summary data with an adaptive test approach.利用 GWAS 汇总数据和自适应检验方法整合多种性状,以检测新的性状-基因关联。
Bioinformatics. 2019 Jul 1;35(13):2251-2257. doi: 10.1093/bioinformatics/bty961.
9
REINDEER: efficient indexing of k-mer presence and abundance in sequencing datasets.驯鹿:测序数据集中小段序列存在和丰度的高效索引。
Bioinformatics. 2020 Jul 1;36(Suppl_1):i177-i185. doi: 10.1093/bioinformatics/btaa487.
10
The K-mer File Format: a standardized and compact disk representation of sets of k-mers.K-mer 文件格式:一种用于表示 K-mer 集合的标准化、紧凑的磁盘表示形式。
Bioinformatics. 2022 Sep 15;38(18):4423-4425. doi: 10.1093/bioinformatics/btac528.

引用本文的文献

1
Graphite: painting genomes using a colored de Bruijn graph.Graphite:使用彩色德布鲁因图绘制基因组
NAR Genom Bioinform. 2024 Oct 23;6(4):lqae142. doi: 10.1093/nargab/lqae142. eCollection 2024 Sep.
2
: a pre- and post- genome-wide association studies pipeline for detecting phenotype-associated genome rearrangement events.一个用于检测表型相关基因组重排事件的全基因组关联研究前和后管道。
Microb Genom. 2024 Jul;10(7). doi: 10.1099/mgen.0.001268.
3
Genome-Wide Association Studies (GWAS) Approaches for the Detection of Genetic Variants Associated with Antibiotic Resistance: A Systematic Review.

本文引用的文献

1
Genomic diversity and ecology of human-associated Akkermansia species in the gut microbiome revealed by extensive metagenomic assembly.通过广泛的宏基因组组装揭示肠道微生物群中与人类相关的阿克曼氏菌属物种的基因组多样性和生态学
Genome Biol. 2021 Jul 14;22(1):209. doi: 10.1186/s13059-021-02427-7.
2
pyseer: a comprehensive tool for microbial pangenome-wide association studies.pyseer:一种用于微生物泛基因组关联研究的综合工具。
Bioinformatics. 2018 Dec 15;34(24):4310-4312. doi: 10.1093/bioinformatics/bty539.
3
A fast and agnostic method for bacterial genome-wide association studies: Bridging the gap between k-mers and genetic events.
全基因组关联研究(GWAS)用于检测与抗生素耐药性相关的基因变异的方法:一项系统综述
Microorganisms. 2023 Nov 27;11(12):2866. doi: 10.3390/microorganisms11122866.
4
Reduced ambiguity and improved interpretability of bacterial genome-wide associations using gene-cluster-centric -mers.使用基于基因簇中心的 -mers 减少细菌全基因组关联的歧义性并提高可解释性。
Microb Genom. 2023 Nov;9(11). doi: 10.1099/mgen.0.001129.
一种快速且无偏倚的细菌全基因组关联研究方法:弥合 k- mers 与遗传事件之间的差距。
PLoS Genet. 2018 Nov 12;14(11):e1007758. doi: 10.1371/journal.pgen.1007758. eCollection 2018 Nov.
4
10 Years of GWAS Discovery: Biology, Function, and Translation.全基因组关联研究十年发现:生物学、功能与转化
Am J Hum Genet. 2017 Jul 6;101(1):5-22. doi: 10.1016/j.ajhg.2017.06.005.
5
Correlation between phenotypic antibiotic susceptibility and the resistome in Pseudomonas aeruginosa.铜绿假单胞菌表型抗生素敏感性与耐药组之间的相关性。
Int J Antimicrob Agents. 2017 Aug;50(2):210-218. doi: 10.1016/j.ijantimicag.2017.02.026. Epub 2017 May 26.
6
Genome-wide genetic heterogeneity discovery with categorical covariates.利用分类协变量进行全基因组遗传异质性发现
Bioinformatics. 2017 Jun 15;33(12):1820-1828. doi: 10.1093/bioinformatics/btx071.
7
Predictive computational phenotyping and biomarker discovery using reference-free genome comparisons.使用无参考基因组比较的预测性计算表型分析和生物标志物发现。
BMC Genomics. 2016 Sep 26;17(1):754. doi: 10.1186/s12864-016-2889-6.
8
Identifying lineage effects when controlling for population structure improves power in bacterial association studies.在控制群体结构时识别谱系效应可提高细菌关联研究的效能。
Nat Microbiol. 2016 Apr 4;1:16041. doi: 10.1038/nmicrobiol.2016.41.
9
NCBI prokaryotic genome annotation pipeline.美国国立生物技术信息中心原核生物基因组注释管道
Nucleic Acids Res. 2016 Aug 19;44(14):6614-24. doi: 10.1093/nar/gkw569. Epub 2016 Jun 24.
10
Genome-wide detection of intervals of genetic heterogeneity associated with complex traits.全基因组检测与复杂性状相关的遗传异质性区间
Bioinformatics. 2015 Jun 15;31(12):i240-9. doi: 10.1093/bioinformatics/btv263.