Suppr超能文献

“广义基因组特征”的引入用于量化邻域偏好,从而导致基于分类学和功能的序列区分。

Introduction of 'Generalized Genomic Signatures' for the quantification of neighbour preferences leads to taxonomy- and functionality-based distinction among sequences.

机构信息

Institute of Biosciences and Applications, National Center for Scientific Research "Demokritos", 15310, Athens, Greece.

Genomics England, Charterhouse Square, London, EC1M 6BQ, UK.

出版信息

Sci Rep. 2019 Feb 8;9(1):1700. doi: 10.1038/s41598-018-38157-3.

Abstract

Analysis of DNA composition at several length scales constitutes the bulk of many early studies aimed at unravelling the complexity of the organization and functionality of genomes. Dinucleotide relative abundances are considered an idiosyncratic feature of genomes, regarded as a 'genomic signature'. Motivated by this finding, we introduce the 'Generalized Genomic Signatures' (GGSs), composed of over- and under-abundances of all oligonucleotides of a given length, thus filtering out compositional trends and neighbour preferences at any shorter range. Previous works on alignment-free genomic comparisons mostly rely on k-mer frequencies and not on distance-dependent neighbour preferences. Therein, nucleotide composition and proximity preferences are combined, while in the present work they are strictly separated, focusing uniquely on neighbour relationships. GGSs retain the potential or even outperform genomic signatures defined at the dinucleotide level in distinguishing between taxonomic subdivisions of bacteria, and can be more effectively implemented in microbial phylogenetic reconstruction. Moreover, we compare DNA sequences from the human genome corresponding to protein coding segments, conserved non-coding elements and non-functional DNA stretches. These classes of sequences have distinctive GGSs according to their genomic role and degree of conservation. Overall, GGSs constitute a trait characteristic of the evolutionary origin and functionality of different genomic segments.

摘要

对多个长度尺度上的 DNA 组成进行分析是许多旨在揭示基因组组织和功能复杂性的早期研究的主要内容。二核苷酸相对丰度被认为是基因组的特有特征,被视为“基因组特征”。受此发现的启发,我们引入了“广义基因组特征”(GGS),它由给定长度的所有寡核苷酸的过丰度和欠丰度组成,从而过滤掉任何更短范围内的组成趋势和相邻偏好。以前关于无比对基因组比较的工作主要依赖于 k-mer 频率,而不是依赖于距离相关的相邻偏好。在这些工作中,核苷酸组成和接近偏好是结合在一起的,而在本工作中,它们是严格分开的,只专注于相邻关系。GGS 在区分细菌的分类细分方面保留了甚至超过在二核苷酸水平定义的基因组特征的潜力,并且可以更有效地用于微生物系统发育重建。此外,我们比较了来自人类基因组的对应于蛋白质编码片段、保守非编码元件和非功能 DNA 片段的 DNA 序列。这些序列类根据其基因组作用和保守程度具有独特的 GGS。总体而言,GGS 构成了不同基因组片段的进化起源和功能的特征。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8087/6368578/f2ebe29d3b5c/41598_2018_38157_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验