• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于组成区分微生物基因组片段:进化和比较基因组学视角。

Distinguishing microbial genome fragments based on their composition: evolutionary and comparative genomic perspectives.

机构信息

Faculty of Computer Science, Dalhousie University, Halifax, Nova Scotia, Canada.

出版信息

Genome Biol Evol. 2010 Jan 25;2:117-31. doi: 10.1093/gbe/evq004.

DOI:10.1093/gbe/evq004
PMID:20333228
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2839357/
Abstract

It is well known that patterns of nucleotide composition vary within and among genomes, although the reasons why these variations exist are not completely understood. Between-genome compositional variation has been exploited to assign environmental shotgun sequences to their most likely originating genomes, whereas within-genome variation has been used to identify recently acquired genetic material such as pathogenicity islands. Recent sequence assignment techniques have achieved high levels of accuracy on artificial data sets, but the relative difficulty of distinguishing lineages with varying degrees of relatedness, and different types of genomic sequence, has not been examined in depth. We investigated the compositional differences in a set of 774 sequenced microbial genomes, finding rapid divergence among closely related genomes, but also convergence of compositional patterns among genomes with similar habitats. Support vector machines were then used to distinguish all pairs of genomes based on genome fragments 500 nucleotides in length. The nearly 300,000 accuracy scores obtained from these trials were used to construct general models of distinguishability versus taxonomic and compositional indices of genomic divergence. Unusual genome pairs were evident from their large residuals relative to the fitted model, and we identified several factors including genome reduction, putative lateral genetic transfer, and habitat convergence that influence the distinguishability of genomes. The positional, compositional, and functional context of a fragment within a genome has a strong influence on its likelihood of correct classification, but in a way that depends on the taxonomic and ecological similarity of the comparator genome.

摘要

众所周知,核苷酸组成模式在基因组内和基因组之间都存在差异,尽管这些差异存在的原因尚未完全了解。基因组间组成的变化已被用于将环境 shotgun 序列分配给其最可能的起源基因组,而基因组内的变化则被用于识别最近获得的遗传物质,如致病性岛。最近的序列分配技术在人工数据集上达到了很高的准确性水平,但区分具有不同亲缘关系程度和不同类型基因组序列的谱系的相对难度尚未深入研究。我们研究了一组 774 个测序微生物基因组中的组成差异,发现密切相关的基因组之间存在快速分歧,但也发现了具有相似生境的基因组之间组成模式的趋同。然后,支持向量机被用于根据 500 个核苷酸长的基因组片段区分所有基因组对。从这些试验中获得的近 30 万个准确性得分被用于构建可区分性与基因组分歧的分类和组成指标的通用模型。不寻常的基因组对从它们相对于拟合模型的大残差中明显看出,我们确定了几个因素,包括基因组缩减、可能的横向基因转移和生境趋同,这些因素影响基因组的可区分性。基因组内片段的位置、组成和功能上下文对其正确分类的可能性有很强的影响,但影响方式取决于比较基因组的分类和生态相似性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/19f3/2839357/bedcbba364c9/gbeevq004f07_3c.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/19f3/2839357/533c1db14993/gbeevq004f01_3c.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/19f3/2839357/ce778c49f2f0/gbeevq004f02_ht.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/19f3/2839357/f1d68db8c539/gbeevq004f03_3c.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/19f3/2839357/de4db06a98e9/gbeevq004f04_ht.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/19f3/2839357/42e26b22ad98/gbeevq004f05_3c.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/19f3/2839357/f14f6510d341/gbeevq004f06_3c.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/19f3/2839357/bedcbba364c9/gbeevq004f07_3c.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/19f3/2839357/533c1db14993/gbeevq004f01_3c.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/19f3/2839357/ce778c49f2f0/gbeevq004f02_ht.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/19f3/2839357/f1d68db8c539/gbeevq004f03_3c.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/19f3/2839357/de4db06a98e9/gbeevq004f04_ht.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/19f3/2839357/42e26b22ad98/gbeevq004f05_3c.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/19f3/2839357/f14f6510d341/gbeevq004f06_3c.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/19f3/2839357/bedcbba364c9/gbeevq004f07_3c.jpg

相似文献

1
Distinguishing microbial genome fragments based on their composition: evolutionary and comparative genomic perspectives.基于组成区分微生物基因组片段:进化和比较基因组学视角。
Genome Biol Evol. 2010 Jan 25;2:117-31. doi: 10.1093/gbe/evq004.
2
Stratification of co-evolving genomic groups using ranked phylogenetic profiles.基于排序系统发育轮廓对共同进化基因组群组进行分层。
BMC Bioinformatics. 2009 Oct 27;10:355. doi: 10.1186/1471-2105-10-355.
3
TACOA: taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach.TACOA:使用核化最近邻方法对环境基因组片段进行分类学分类。
BMC Bioinformatics. 2009 Feb 11;10:56. doi: 10.1186/1471-2105-10-56.
4
A mathematical method for determining genome divergence and species delineation using AFLP.一种使用扩增片段长度多态性(AFLP)来确定基因组差异和物种划分的数学方法。
Int J Syst Evol Microbiol. 2002 Mar;52(Pt 2):573-586. doi: 10.1099/00207713-52-2-573.
5
6
Nucleotide composition as a driving force in the evolution of retroviruses.核苷酸组成作为逆转录病毒进化的驱动力。
J Mol Evol. 1994 May;38(5):506-32. doi: 10.1007/BF00178851.
7
Comparative chloroplast genomics: analyses including new sequences from the angiosperms Nuphar advena and Ranunculus macranthus.比较叶绿体基因组学:分析包括来自被子植物萍蓬草和大花毛茛的新序列。
BMC Genomics. 2007 Jun 15;8:174. doi: 10.1186/1471-2164-8-174.
8
Optimizing and evaluating the reconstruction of Metagenome-assembled microbial genomes.优化和评估宏基因组组装微生物基因组的重建。
BMC Genomics. 2017 Nov 28;18(1):915. doi: 10.1186/s12864-017-4294-1.
9
Detection of genomic islands via segmental genome heterogeneity.通过基因组片段异质性检测基因组岛
Nucleic Acids Res. 2009 Sep;37(16):5255-66. doi: 10.1093/nar/gkp576. Epub 2009 Jul 9.
10
Mitochondrial phylogenomics of early land plants: mitigating the effects of saturation, compositional heterogeneity, and codon-usage bias.早期陆地植物的线粒体系统发育基因组学:减轻饱和度、组成异质性和密码子使用偏好的影响。
Syst Biol. 2014 Nov;63(6):862-78. doi: 10.1093/sysbio/syu049. Epub 2014 Jul 28.

引用本文的文献

1
The GC% landscape of the Nucleocytoviricota.核质巨DNA病毒目(Nucleocytoviricota)的GC含量概况
Braz J Microbiol. 2024 Dec;55(4):3373-3387. doi: 10.1007/s42770-024-01496-7. Epub 2024 Aug 24.
2
kmerDB: A database encompassing the set of genomic and proteomic sequence information for each species.kmer数据库:一个包含每个物种基因组和蛋白质组序列信息集合的数据库。
Comput Struct Biotechnol J. 2024 Apr 21;23:1919-1928. doi: 10.1016/j.csbj.2024.04.050. eCollection 2024 Dec.
3
Correcting the Estimation of Viral Taxa Distributions in Next-Generation Sequencing Data after Applying Artificial Neural Networks.

本文引用的文献

1
Analysis of genomic signatures in prokaryotes using multinomial regression and hierarchical clustering.使用多项回归和层次聚类分析原核生物中的基因组特征。
BMC Genomics. 2009 Oct 21;10:487. doi: 10.1186/1471-2164-10-487.
2
Community-wide analysis of microbial genome sequence signatures.全社区微生物基因组序列特征分析。
Genome Biol. 2009;10(8):R85. doi: 10.1186/gb-2009-10-8-r85. Epub 2009 Aug 21.
3
Circos: an information aesthetic for comparative genomics.Circos:一种用于比较基因组学的信息美学。
应用人工神经网络后校正下一代测序数据中病毒分类群分布的估计。
Genes (Basel). 2021 Oct 31;12(11):1755. doi: 10.3390/genes12111755.
4
Development of self-compressing BLSOM for comprehensive analysis of big sequence data.用于大序列数据综合分析的自压缩BLSOM的开发。
Biomed Res Int. 2015;2015:506052. doi: 10.1155/2015/506052. Epub 2015 Oct 1.
5
A Markovian analysis of bacterial genome sequence constraints.细菌基因组序列约束的马尔可夫分析。
PeerJ. 2013 Aug 29;1:e127. doi: 10.7717/peerj.127. eCollection 2013.
6
Resolving prokaryotic taxonomy without rRNA: longer oligonucleotide word lengths improve genome and metagenome taxonomic classification.不依赖 rRNA 解析原核分类学:更长的寡核苷酸字长可改善基因组和宏基因组的分类学分类。
PLoS One. 2013 Jul 1;8(7):e67337. doi: 10.1371/journal.pone.0067337. Print 2013.
7
SPANNER: taxonomic assignment of sequences using pyramid matching of similarity profiles.SPANNER:使用相似性轮廓的金字塔匹配进行序列的分类分配。
Bioinformatics. 2013 Aug 1;29(15):1858-64. doi: 10.1093/bioinformatics/btt313. Epub 2013 Jun 3.
8
Computational tools for viral metagenomics and their application in clinical research.病毒宏基因组学的计算工具及其在临床研究中的应用。
Virology. 2012 Dec 20;434(2):162-74. doi: 10.1016/j.virol.2012.09.025. Epub 2012 Oct 11.
9
Microbial lifestyle and genome signatures.微生物的生活方式和基因组特征。
Curr Genomics. 2012 Apr;13(2):153-62. doi: 10.2174/138920212799860698.
10
Rapid identification of high-confidence taxonomic assignments for metagenomic data.快速鉴定宏基因组数据的高可信度分类学分配。
Nucleic Acids Res. 2012 Aug;40(14):e111. doi: 10.1093/nar/gks335. Epub 2012 Apr 24.
Genome Res. 2009 Sep;19(9):1639-45. doi: 10.1101/gr.092759.109. Epub 2009 Jun 18.
4
Phylogenetic signals in DNA composition: limitations and prospects.DNA 组成中的系统发育信号:局限性与前景
Mol Biol Evol. 2009 May;26(5):1163-9. doi: 10.1093/molbev/msp032. Epub 2009 Feb 20.
5
TACOA: taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach.TACOA:使用核化最近邻方法对环境基因组片段进行分类学分类。
BMC Bioinformatics. 2009 Feb 11;10:56. doi: 10.1186/1471-2105-10-56.
6
The Ribosomal Database Project: improved alignments and new tools for rRNA analysis.核糖体数据库项目:改进的比对方法及用于rRNA分析的新工具。
Nucleic Acids Res. 2009 Jan;37(Database issue):D141-5. doi: 10.1093/nar/gkn879. Epub 2008 Nov 12.
7
Using Mahalanobis distance to compare genomic signatures between bacterial plasmids and chromosomes.使用马氏距离比较细菌质粒和染色体之间的基因组特征。
Nucleic Acids Res. 2008 Dec;36(22):e147. doi: 10.1093/nar/gkn753. Epub 2008 Oct 25.
8
Accounting for horizontal gene transfers explains conflicting hypotheses regarding the position of aquificales in the phylogeny of Bacteria.对水平基因转移的考量解释了关于嗜水栖热菌目在细菌系统发育中位置的相互矛盾的假说。
BMC Evol Biol. 2008 Oct 3;8:272. doi: 10.1186/1471-2148-8-272.
9
A comparison of random sequence reads versus 16S rDNA sequences for estimating the biodiversity of a metagenomic library.用于估计宏基因组文库生物多样性的随机序列读数与16S rDNA序列的比较。
Nucleic Acids Res. 2008 Sep;36(16):5180-8. doi: 10.1093/nar/gkn496. Epub 2008 Aug 5.
10
Modular networks and cumulative impact of lateral transfer in prokaryote genome evolution.模块化网络与原核生物基因组进化中横向转移的累积影响。
Proc Natl Acad Sci U S A. 2008 Jul 22;105(29):10039-44. doi: 10.1073/pnas.0800679105. Epub 2008 Jul 16.