Suppr超能文献

使用模糊系统发育轮廓检测基因组特征。

Detection of genomic idiosyncrasies using fuzzy phylogenetic profiles.

机构信息

Department of Electrical and Computer Engineering, Aristotle University of Thessaloniki, Thessaloniki, Greece.

出版信息

PLoS One. 2013;8(1):e52854. doi: 10.1371/journal.pone.0052854. Epub 2013 Jan 14.

Abstract

Phylogenetic profiles express the presence or absence of genes and their homologs across a number of reference genomes. They have emerged as an elegant representation framework for comparative genomics and have been used for the genome-wide inference and discovery of functionally linked genes or metabolic pathways. As the number of reference genomes grows, there is an acute need for faster and more accurate methods for phylogenetic profile analysis with increased performance in speed and quality. We propose a novel, efficient method for the detection of genomic idiosyncrasies, i.e. sets of genes found in a specific genome with peculiar phylogenetic properties, such as intra-genome correlations or inter-genome relationships. Our algorithm is a four-step process where genome profiles are first defined as fuzzy vectors, then discretized to binary vectors, followed by a de-noising step, and finally a comparison step to generate intra- and inter-genome distances for each gene profile. The method is validated with a carefully selected benchmark set of five reference genomes, using a range of approaches regarding similarity metrics and pre-processing stages for noise reduction. We demonstrate that the fuzzy profile method consistently identifies the actual phylogenetic relationship and origin of the genes under consideration for the majority of the cases, while the detected outliers are found to be particular genes with peculiar phylogenetic patterns. The proposed method provides a time-efficient and highly scalable approach for phylogenetic stratification, with the detected groups of genes being either similar to their own genome profile or different from it, thus revealing atypical evolutionary histories.

摘要

系统发生轮廓表达了在许多参考基因组中基因及其同源物的存在或缺失。它们已成为比较基因组学的一种优雅表示框架,并被用于功能相关基因或代谢途径的全基因组推断和发现。随着参考基因组数量的增加,人们迫切需要更快、更准确的方法来进行系统发生轮廓分析,以提高速度和质量方面的性能。我们提出了一种新颖、高效的方法来检测基因组特征,即发现在特定基因组中具有特殊系统发生特性的基因集,例如基因组内相关性或基因组间关系。我们的算法是一个四步过程,首先将基因组轮廓定义为模糊向量,然后将其离散化为二进制向量,接着进行去噪步骤,最后进行比较步骤,为每个基因轮廓生成基因组内和基因组间的距离。该方法使用一系列关于相似性度量和降噪预处理阶段的方法,在精心挑选的五个参考基因组的基准数据集上进行了验证。我们证明,模糊轮廓方法能够一致地识别出所考虑基因的实际系统发生关系和起源,而检测到的异常值被发现是具有特殊系统发生模式的特定基因。所提出的方法为系统发生分层提供了一种高效、高度可扩展的方法,所检测到的基因组要么与它们自己的基因组轮廓相似,要么与它们不同,从而揭示了非典型的进化历史。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bdcd/3544837/856e89c1fae7/pone.0052854.g001.jpg

相似文献

1
Detection of genomic idiosyncrasies using fuzzy phylogenetic profiles.
PLoS One. 2013;8(1):e52854. doi: 10.1371/journal.pone.0052854. Epub 2013 Jan 14.
2
Distribution of nitrogen fixation and nitrogenase-like sequences amongst microbial genomes.
BMC Genomics. 2012 May 3;13:162. doi: 10.1186/1471-2164-13-162.
3
A phylogeny-driven genomic encyclopaedia of Bacteria and Archaea.
Nature. 2009 Dec 24;462(7276):1056-60. doi: 10.1038/nature08656.
5
Origins of major archaeal clades correspond to gene acquisitions from bacteria.
Nature. 2015 Jan 1;517(7532):77-80. doi: 10.1038/nature13805. Epub 2014 Oct 15.
7
Phylogeny of bacterial and archaeal genomes using conserved genes: supertrees and supermatrices.
PLoS One. 2013 Apr 25;8(4):e62510. doi: 10.1371/journal.pone.0062510. Print 2013.
9
A DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysis.
Nucleic Acids Res. 2002 Jan 15;30(2):482-96. doi: 10.1093/nar/30.2.482.
10
Adding genomic 'foliage' to the tree of life.
Nat Rev Microbiol. 2014 Feb;12(2):78. doi: 10.1038/nrmicro3203.

引用本文的文献

1
Classification of genomes with a bag-of-words approach and machine learning.
iScience. 2024 Feb 16;27(3):109257. doi: 10.1016/j.isci.2024.109257. eCollection 2024 Mar 15.
3
Ten Years of Collaborative Progress in the Quest for Orthologs.
Mol Biol Evol. 2021 Jul 29;38(8):3033-3045. doi: 10.1093/molbev/msab098.
4
PhotoModPlus: A web server for photosynthetic protein prediction from genome neighborhood features.
PLoS One. 2021 Mar 17;16(3):e0248682. doi: 10.1371/journal.pone.0248682. eCollection 2021.
5
Ancestral state reconstruction of metabolic pathways across pangenome ensembles.
Microb Genom. 2020 Nov;6(11). doi: 10.1099/mgen.0.000429.
6
Developing computational biology at meridian 23° E, and a little eastwards.
J Biol Res (Thessalon). 2018 Nov 14;25:18. doi: 10.1186/s40709-018-0091-5. eCollection 2018 Dec.
7
PrePhyloPro: phylogenetic profile-based prediction of whole proteome linkages.
PeerJ. 2017 Aug 28;5:e3712. doi: 10.7717/peerj.3712. eCollection 2017.
8
SVD-phy: improved prediction of protein functional associations through singular value decomposition of phylogenetic profiles.
Bioinformatics. 2016 Apr 1;32(7):1085-7. doi: 10.1093/bioinformatics/btv696. Epub 2015 Nov 26.

本文引用的文献

1
A systematic study of genome context methods: calibration, normalization and combination.
BMC Bioinformatics. 2010 Oct 1;11:493. doi: 10.1186/1471-2105-11-493.
2
Stratification of co-evolving genomic groups using ranked phylogenetic profiles.
BMC Bioinformatics. 2009 Oct 27;10:355. doi: 10.1186/1471-2105-10-355.
3
Life on arginine for Mycoplasma hominis: clues from its minimal genome and comparison with other human urogenital mycoplasmas.
PLoS Genet. 2009 Oct;5(10):e1000677. doi: 10.1371/journal.pgen.1000677. Epub 2009 Oct 9.
5
The Mycoplasma genitalium MG_454 gene product resists killing by organic hydroperoxides.
J Bacteriol. 2009 Nov;191(21):6675-82. doi: 10.1128/JB.01066-08. Epub 2009 Aug 28.
6
Uncovering metabolic pathways relevant to phenotypic traits of microbial genomes.
Genome Biol. 2009;10(3):R28. doi: 10.1186/gb-2009-10-3-r28. Epub 2009 Mar 10.
7
Microbial genotype-phenotype mapping by class association rule mining.
Bioinformatics. 2008 Jul 1;24(13):1523-9. doi: 10.1093/bioinformatics/btn210. Epub 2008 May 8.
8
Assigning functional linkages to proteins using phylogenetic profiles and continuous phenotypes.
Bioinformatics. 2008 May 15;24(10):1257-63. doi: 10.1093/bioinformatics/btn106. Epub 2008 Apr 1.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验