• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

从蛋白质域频率预测原核生物的表型特征。

Predicting phenotypic traits of prokaryotes from protein domain frequencies.

机构信息

Department of Bioinformatics, Institute of Microbiology and Genetics, Georg-August-University Göttingen, Germany.

出版信息

BMC Bioinformatics. 2010 Sep 24;11:481. doi: 10.1186/1471-2105-11-481.

DOI:10.1186/1471-2105-11-481
PMID:20868492
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2955703/
Abstract

BACKGROUND

Establishing the relationship between an organism's genome sequence and its phenotype is a fundamental challenge that remains largely unsolved. Accurately predicting microbial phenotypes solely based on genomic features will allow us to infer relevant phenotypic characteristics when the availability of a genome sequence precedes experimental characterization, a scenario that is favored by the advent of novel high-throughput and single cell sequencing techniques.

RESULTS

We present a novel approach to predict the phenotype of prokaryotes directly from their protein domain frequencies. Our discriminative machine learning approach provides high prediction accuracy of relevant phenotypes such as motility, oxygen requirement or spore formation. Moreover, the set of discriminative domains provides biological insight into the underlying phenotype-genotype relationship and enables deriving hypotheses on the possible functions of uncharacterized domains.

CONCLUSIONS

Fast and accurate prediction of microbial phenotypes based on genomic protein domain content is feasible and has the potential to provide novel biological insights. First results of a systematic check for annotation errors indicate that our approach may also be applied to semi-automatic correction and completion of the existing phenotype annotation.

摘要

背景

建立生物体基因组序列与其表型之间的关系是一个基本挑战,目前尚未得到很好的解决。仅基于基因组特征准确预测微生物表型,当可用的基因组序列先于实验特征描述时,我们可以推断出相关的表型特征,这种情况在新型高通量和单细胞测序技术出现后变得有利。

结果

我们提出了一种从蛋白质结构域频率直接预测原核生物表型的新方法。我们的判别机器学习方法提供了对相关表型(如运动性、需氧性或孢子形成)的高预测准确性。此外,这些判别结构域集为潜在的表型-基因型关系提供了生物学见解,并能够推导出关于未表征结构域可能功能的假设。

结论

基于基因组蛋白质结构域含量快速准确地预测微生物表型是可行的,并且有可能提供新的生物学见解。对注释错误进行系统检查的初步结果表明,我们的方法也可应用于半自动校正和完成现有表型注释。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a0b6/2955703/247b57504d4c/1471-2105-11-481-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a0b6/2955703/3623c770d044/1471-2105-11-481-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a0b6/2955703/247b57504d4c/1471-2105-11-481-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a0b6/2955703/3623c770d044/1471-2105-11-481-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a0b6/2955703/247b57504d4c/1471-2105-11-481-2.jpg

相似文献

1
Predicting phenotypic traits of prokaryotes from protein domain frequencies.从蛋白质域频率预测原核生物的表型特征。
BMC Bioinformatics. 2010 Sep 24;11:481. doi: 10.1186/1471-2105-11-481.
2
3
The landscape of microbial phenotypic traits and associated genes.微生物表型特征及相关基因图谱。
Nucleic Acids Res. 2016 Dec 1;44(21):10074-10090. doi: 10.1093/nar/gkw964. Epub 2016 Oct 24.
4
Beyond the 'best' match: machine learning annotation of protein sequences by integration of different sources of information.超越“最佳”匹配:通过整合不同信息源对蛋白质序列进行机器学习注释
Bioinformatics. 2008 Mar 1;24(5):621-8. doi: 10.1093/bioinformatics/btm633. Epub 2008 Jan 3.
5
Re-annotation of genome microbial coding-sequences: finding new genes and inaccurately annotated genes.基因组微生物编码序列的重新注释:发现新基因和注释不准确的基因。
BMC Bioinformatics. 2002;3:5. doi: 10.1186/1471-2105-3-5. Epub 2002 Feb 5.
6
Prediction of microbial phenotypes based on comparative genomics.基于比较基因组学的微生物表型预测
BMC Bioinformatics. 2015;16 Suppl 14(Suppl 14):S1. doi: 10.1186/1471-2105-16-S14-S1. Epub 2015 Oct 2.
7
A strategy for predicting gene functions from genome and metagenome sequences on the basis of oligopeptide frequency distance.一种基于寡肽频率距离从基因组和宏基因组序列预测基因功能的策略。
Genes Genet Syst. 2020 Apr 22;95(1):11-19. doi: 10.1266/ggs.19-00041. Epub 2020 Mar 12.
8
A multi-source domain annotation pipeline for quantitative metagenomic and metatranscriptomic functional profiling.用于定量宏基因组和宏转录组功能分析的多源域注释管道。
Microbiome. 2018 Aug 28;6(1):149. doi: 10.1186/s40168-018-0532-2.
9
[Comprehensive re-annotation of protein-coding genes for prokaryotic genomes by Z-curve and similarity-based methods].[基于Z曲线和相似性方法对原核生物基因组蛋白质编码基因进行全面重新注释]
Yi Chuan. 2020 Jul 20;42(7):691-702. doi: 10.16288/j.yczz.20-022.
10
Predictive genomics: a cancer hallmark network framework for predicting tumor clinical phenotypes using genome sequencing data.预测基因组学:使用基因组测序数据预测肿瘤临床表型的癌症标志网络框架。
Semin Cancer Biol. 2015 Feb;30:4-12. doi: 10.1016/j.semcancer.2014.04.002. Epub 2014 Apr 18.

引用本文的文献

1
Predicting bacterial phenotypic traits through improved machine learning using high-quality, curated datasets.通过使用高质量的、经过整理的数据集改进机器学习来预测细菌表型特征。
Commun Biol. 2025 Jun 7;8(1):897. doi: 10.1038/s42003-025-08313-3.
2
From genotype to phenotype: computational approaches for inferring microbial traits relevant to the food industry.从基因型到表型:推断与食品工业相关的微生物特性的计算方法。
FEMS Microbiol Rev. 2023 Jul 5;47(4). doi: 10.1093/femsre/fuad030.
3
From Genomes to Phenotypes: Traitar, the Microbial Trait Analyzer.

本文引用的文献

1
Complete genome sequence of Acidimicrobium ferrooxidans type strain (ICP).嗜酸氧化亚铁硫杆菌模式菌株(ICP)的全基因组序列
Stand Genomic Sci. 2009 Jul 20;1(1):38-45. doi: 10.4056/sigs.1463.
2
Promoter and regulon analysis of nitrogen assimilation factor, sigma54, reveal alternative strategy for E. coli MG1655 flagellar biosynthesis.氮同化因子 sigma54 的启动子和调控区分析揭示了大肠杆菌 MG1655 鞭毛生物合成的替代策略。
Nucleic Acids Res. 2010 Mar;38(4):1273-83. doi: 10.1093/nar/gkp1123. Epub 2009 Dec 6.
3
Joining forces in the quest for orthologs.
从基因组到表型:Traitar,微生物性状分析仪。
mSystems. 2016 Dec 27;1(6). doi: 10.1128/mSystems.00101-16. eCollection 2016 Nov-Dec.
4
Bayesian prediction of microbial oxygen requirement.微生物需氧量的贝叶斯预测。
F1000Res. 2013 Sep 13;2:184. doi: 10.12688/f1000research.2-184.v1. eCollection 2013.
5
Inference of phenotype-defining functional modules of protein families for microbial plant biomass degraders.推断微生物植物生物质降解蛋白家族的表型定义功能模块。
Biotechnol Biofuels. 2014 Sep 9;7(1):124. doi: 10.1186/s13068-014-0124-8. eCollection 2014.
6
Protein signature-based estimation of metagenomic abundances including all domains of life and viruses.基于蛋白质特征的宏基因组丰度估计,包括所有生命领域和病毒。
Bioinformatics. 2013 Apr 15;29(8):973-80. doi: 10.1093/bioinformatics/btt077. Epub 2013 Feb 15.
联手寻找直系同源物。
Genome Biol. 2009;10(9):403. doi: 10.1186/gb-2009-10-9-403. Epub 2009 Sep 29.
4
UFO: a web server for ultra-fast functional profiling of whole genome protein sequences.UFO:用于全基因组蛋白质序列超快速功能分析的网络服务器。
BMC Genomics. 2009 Sep 2;10:409. doi: 10.1186/1471-2164-10-409.
5
Assembling the marine metagenome, one cell at a time.一次一个细胞地组装海洋宏基因组。
PLoS One. 2009;4(4):e5299. doi: 10.1371/journal.pone.0005299. Epub 2009 Apr 23.
6
Uncovering metabolic pathways relevant to phenotypic traits of microbial genomes.揭示与微生物基因组表型特征相关的代谢途径。
Genome Biol. 2009;10(3):R28. doi: 10.1186/gb-2009-10-3-r28. Epub 2009 Mar 10.
7
A bioinformatician's guide to metagenomics.宏基因组学的生物信息学指南。
Microbiol Mol Biol Rev. 2008 Dec;72(4):557-78, Table of Contents. doi: 10.1128/MMBR.00009-08.
8
PEDANT covers all complete RefSeq genomes.PEDANT涵盖了所有完整的RefSeq基因组。
Nucleic Acids Res. 2009 Jan;37(Database issue):D408-11. doi: 10.1093/nar/gkn749. Epub 2008 Oct 21.
9
Genomic sequencing of single microbial cells from environmental samples.对环境样本中的单个微生物细胞进行基因组测序。
Curr Opin Microbiol. 2008 Jun;11(3):198-204. doi: 10.1016/j.mib.2008.05.006. Epub 2008 Jun 10.
10
Microbial genotype-phenotype mapping by class association rule mining.通过类关联规则挖掘进行微生物基因型-表型映射
Bioinformatics. 2008 Jul 1;24(13):1523-9. doi: 10.1093/bioinformatics/btn210. Epub 2008 May 8.