• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于 k-mer 的方法鉴定与表型相关的基因组生物标志物并预测测序细菌的表型。

A k-mer-based method for the identification of phenotype-associated genomic biomarkers and predicting phenotypes of sequenced bacteria.

机构信息

Department of Bioinformatics, Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia.

Institute of Technology, University of Tartu, Tartu, Estonia.

出版信息

PLoS Comput Biol. 2018 Oct 22;14(10):e1006434. doi: 10.1371/journal.pcbi.1006434. eCollection 2018 Oct.

DOI:10.1371/journal.pcbi.1006434
PMID:30346947
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6211763/
Abstract

We have developed an easy-to-use and memory-efficient method called PhenotypeSeeker that (a) identifies phenotype-specific k-mers, (b) generates a k-mer-based statistical model for predicting a given phenotype and (c) predicts the phenotype from the sequencing data of a given bacterial isolate. The method was validated on 167 Klebsiella pneumoniae isolates (virulence), 200 Pseudomonas aeruginosa isolates (ciprofloxacin resistance) and 459 Clostridium difficile isolates (azithromycin resistance). The phenotype prediction models trained from these datasets obtained the F1-measure of 0.88 on the K. pneumoniae test set, 0.88 on the P. aeruginosa test set and 0.97 on the C. difficile test set. The F1-measures were the same for assembled sequences and raw sequencing data; however, building the model from assembled genomes is significantly faster. On these datasets, the model building on a mid-range Linux server takes approximately 3 to 5 hours per phenotype if assembled genomes are used and 10 hours per phenotype if raw sequencing data are used. The phenotype prediction from assembled genomes takes less than one second per isolate. Thus, PhenotypeSeeker should be well-suited for predicting phenotypes from large sequencing datasets. PhenotypeSeeker is implemented in Python programming language, is open-source software and is available at GitHub (https://github.com/bioinfo-ut/PhenotypeSeeker/).

摘要

我们开发了一种简单易用、内存效率高的方法,称为 PhenotypeSeeker,它 (a) 识别表型特异的 k-mers,(b) 生成基于 k-mer 的统计模型,用于预测给定的表型,(c) 从给定细菌分离物的测序数据预测表型。该方法在 167 株肺炎克雷伯菌分离株(毒力)、200 株铜绿假单胞菌分离株(环丙沙星耐药性)和 459 株艰难梭菌分离株(阿奇霉素耐药性)上进行了验证。从这些数据集训练的表型预测模型在肺炎克雷伯菌测试集中获得了 0.88 的 F1 度量,在铜绿假单胞菌测试集中获得了 0.88 的 F1 度量,在艰难梭菌测试集中获得了 0.97 的 F1 度量。F1 度量在组装序列和原始测序数据上是相同的;然而,从组装基因组构建模型要快得多。在这些数据集上,如果使用组装基因组,每个表型的模型构建大约需要 3 到 5 个小时,如果使用原始测序数据,每个表型需要 10 个小时。从组装基因组进行表型预测每个分离物不到 1 秒。因此,PhenotypeSeeker 应该非常适合从大型测序数据集预测表型。PhenotypeSeeker 是用 Python 编程语言实现的,是开源软件,可在 GitHub(https://github.com/bioinfo-ut/PhenotypeSeeker/)上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1e9b/6211763/892c26d547ba/pcbi.1006434.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1e9b/6211763/fbaeb12a7534/pcbi.1006434.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1e9b/6211763/ac1c81a3a820/pcbi.1006434.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1e9b/6211763/d4e64d4e96c9/pcbi.1006434.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1e9b/6211763/892c26d547ba/pcbi.1006434.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1e9b/6211763/fbaeb12a7534/pcbi.1006434.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1e9b/6211763/ac1c81a3a820/pcbi.1006434.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1e9b/6211763/d4e64d4e96c9/pcbi.1006434.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1e9b/6211763/892c26d547ba/pcbi.1006434.g004.jpg

相似文献

1
A k-mer-based method for the identification of phenotype-associated genomic biomarkers and predicting phenotypes of sequenced bacteria.基于 k-mer 的方法鉴定与表型相关的基因组生物标志物并预测测序细菌的表型。
PLoS Comput Biol. 2018 Oct 22;14(10):e1006434. doi: 10.1371/journal.pcbi.1006434. eCollection 2018 Oct.
2
KCOSS: an ultra-fast k-mer counter for assembled genome analysis.KCOSS:用于组装基因组分析的超快速k-mer计数器。
Bioinformatics. 2022 Jan 27;38(4):933-940. doi: 10.1093/bioinformatics/btab797.
3
FastGT: an alignment-free method for calling common SNVs directly from raw sequencing reads.FastGT:一种从原始测序读段中直接调用常见单核苷酸变异(SNVs)的无需比对方法。
Sci Rep. 2017 May 31;7(1):2537. doi: 10.1038/s41598-017-02487-5.
4
Design of multiple sequence alignment algorithms on parallel, distributed memory supercomputers.并行分布式内存超级计算机上多序列比对算法的设计
Annu Int Conf IEEE Eng Med Biol Soc. 2011;2011:924-7. doi: 10.1109/IEMBS.2011.6090208.
5
Predicting relatedness of bacterial genomes using the chaperonin-60 universal target (cpn60 UT): application to Thermoanaerobacter species.利用分子伴侣 60 通用靶标(cpn60UT)预测细菌基因组的亲缘关系:在嗜热厌氧菌属中的应用。
Syst Appl Microbiol. 2011 May;34(3):171-9. doi: 10.1016/j.syapm.2010.11.019. Epub 2011 Mar 9.
6
Phenetic Comparison of Prokaryotic Genomes Using k-mers.使用k-mer对原核生物基因组进行表型比较。
Mol Biol Evol. 2017 Oct 1;34(10):2716-2729. doi: 10.1093/molbev/msx200.
7
A fast and agnostic method for bacterial genome-wide association studies: Bridging the gap between k-mers and genetic events.一种快速且无偏倚的细菌全基因组关联研究方法:弥合 k- mers 与遗传事件之间的差距。
PLoS Genet. 2018 Nov 12;14(11):e1007758. doi: 10.1371/journal.pgen.1007758. eCollection 2018 Nov.
8
Predicting bacterial resistance from whole-genome sequences using k-mers and stability selection.基于 k- -mer 和稳定性选择预测全基因组序列中的细菌耐药性。
BMC Bioinformatics. 2018 Oct 17;19(1):383. doi: 10.1186/s12859-018-2403-z.
9
Predictive computational phenotyping and biomarker discovery using reference-free genome comparisons.使用无参考基因组比较的预测性计算表型分析和生物标志物发现。
BMC Genomics. 2016 Sep 26;17(1):754. doi: 10.1186/s12864-016-2889-6.
10
Strategy for genome sequencing analysis and assembly for comparative genomics of Pseudomonas genomes.用于假单胞菌基因组比较基因组学的基因组测序分析与组装策略。
Methods Mol Biol. 2014;1149:565-77. doi: 10.1007/978-1-4939-0473-0_43.

引用本文的文献

1
An explainable machine learning pipeline for prediction of antimicrobial resistance in .一种用于预测……中抗菌药物耐药性的可解释机器学习流程。 (注:原文中“in.”后面似乎缺少具体内容)
Bioinform Adv. 2025 Aug 22;5(1):vbaf190. doi: 10.1093/bioadv/vbaf190. eCollection 2025.
2
DeepRice6mA: A convolutional neural network approach for 6mA site prediction in the rice Genome.深度水稻6mA:一种用于水稻基因组中6mA位点预测的卷积神经网络方法。
PLoS One. 2025 Jun 18;20(6):e0325216. doi: 10.1371/journal.pone.0325216. eCollection 2025.
3
Bacterial genome-wide association studies: exploring the genetic variation underlying bacterial phenotypes.

本文引用的文献

1
Machine learning identifies signatures of host adaptation in the bacterial pathogen Salmonella enterica.机器学习鉴定出细菌病原体沙门氏菌中宿主适应的特征。
PLoS Genet. 2018 May 8;14(5):e1007333. doi: 10.1371/journal.pgen.1007333. eCollection 2018 May.
2
Developing an in silico minimum inhibitory concentration panel test for Klebsiella pneumoniae.开发一种用于肺炎克雷伯菌的计算机模拟最小抑菌浓度检测板试验。
Sci Rep. 2018 Jan 11;8(1):421. doi: 10.1038/s41598-017-18972-w.
3
Neptune: a bioinformatics tool for rapid discovery of genomic variation in bacterial populations.
细菌全基因组关联研究:探索细菌表型背后的遗传变异
Appl Environ Microbiol. 2025 Jun 18;91(6):e0251224. doi: 10.1128/aem.02512-24. Epub 2025 May 16.
4
Species annotation using a k-mer based KNN model.使用基于k-mer的K近邻模型进行物种注释。
Bioinformation. 2024 Sep 30;20(9):986-989. doi: 10.6026/973206300200986. eCollection 2024.
5
Using GWAS and Machine Learning to Identify and Predict Genetic Variants Associated with Foodborne Bacteria Phenotypic Traits.利用 GWAS 和机器学习识别和预测与食源性病原体表型特征相关的遗传变异。
Methods Mol Biol. 2025;2852:223-253. doi: 10.1007/978-1-0716-4100-2_16.
6
A survey of k-mer methods and applications in bioinformatics.生物信息学中k-mer方法及其应用综述。
Comput Struct Biotechnol J. 2024 May 21;23:2289-2303. doi: 10.1016/j.csbj.2024.05.025. eCollection 2024 Dec.
7
kmerDB: A database encompassing the set of genomic and proteomic sequence information for each species.kmer数据库:一个包含每个物种基因组和蛋白质组序列信息集合的数据库。
Comput Struct Biotechnol J. 2024 Apr 21;23:1919-1928. doi: 10.1016/j.csbj.2024.04.050. eCollection 2024 Dec.
8
Assessing computational predictions of antimicrobial resistance phenotypes from microbial genomes.评估微生物基因组中抗菌药物耐药表型的计算预测。
Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae206.
9
Convergence of resistance and evolutionary responses in Escherichia coli and Salmonella enterica co-inhabiting chicken farms in China.中国鸡场中共同栖息的大肠杆菌和沙门氏菌的耐药性和进化反应的趋同。
Nat Commun. 2024 Jan 5;15(1):206. doi: 10.1038/s41467-023-44272-1.
10
Disease-Associated Streptococcus pneumoniae Genetic Variation.疾病相关肺炎链球菌遗传变异。
Emerg Infect Dis. 2024 Jan;30(1):39-49. doi: 10.3201/eid3001.221927.
海王星:一种用于快速发现细菌群体基因组变异的生物信息学工具。
Nucleic Acids Res. 2017 Oct 13;45(18):e159. doi: 10.1093/nar/gkx702.
4
Validation of β-lactam minimum inhibitory concentration predictions for pneumococcal isolates with newly encountered penicillin binding protein (PBP) sequences.对具有新发现青霉素结合蛋白(PBP)序列的肺炎球菌分离株β-内酰胺最低抑菌浓度预测的验证
BMC Genomics. 2017 Aug 15;18(1):621. doi: 10.1186/s12864-017-4017-7.
5
FastGT: an alignment-free method for calling common SNVs directly from raw sequencing reads.FastGT:一种从原始测序读段中直接调用常见单核苷酸变异(SNVs)的无需比对方法。
Sci Rep. 2017 May 31;7(1):2537. doi: 10.1038/s41598-017-02487-5.
6
Next-generation approaches to understand and combat the antibiotic resistome.理解和对抗抗生素耐药基因组的新一代方法。
Nat Rev Microbiol. 2017 Jul;15(7):422-434. doi: 10.1038/nrmicro.2017.28. Epub 2017 Apr 10.
7
Predictive computational phenotyping and biomarker discovery using reference-free genome comparisons.使用无参考基因组比较的预测性计算表型分析和生物标志物发现。
BMC Genomics. 2016 Sep 26;17(1):754. doi: 10.1186/s12864-016-2889-6.
8
Sequence element enrichment analysis to determine the genetic basis of bacterial phenotypes.序列元件富集分析确定细菌表型的遗传基础。
Nat Commun. 2016 Sep 16;7:12797. doi: 10.1038/ncomms12797.
9
Mash: fast genome and metagenome distance estimation using MinHash.Mash:使用MinHash进行快速的基因组和宏基因组距离估计。
Genome Biol. 2016 Jun 20;17(1):132. doi: 10.1186/s13059-016-0997-x.
10
Antimicrobial Resistance Prediction in PATRIC and RAST.PATRIC 和 RAST 中的抗菌药物耐药性预测。
Sci Rep. 2016 Jun 14;6:27930. doi: 10.1038/srep27930.