• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

8 元组谱的分布规律及动物基因组序列进化状态的特征。

Distribution rules of 8-mer spectra and characterization of evolution state in animal genome sequences.

机构信息

Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot, 010021, China.

School of Economics and Management, Inner Mongolia University of Science and Technology, Baotou, 014010, China.

出版信息

BMC Genomics. 2024 Sep 12;25(1):855. doi: 10.1186/s12864-024-10786-1.

DOI:10.1186/s12864-024-10786-1
PMID:39266973
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11391722/
Abstract

BACKGROUND

Studying the composition rules and evolution mechanisms of genome sequences are core issues in the post-genomic era, and k-mer spectrum analysis of genome sequences is an effective means to solve this problem.

RESULT

We divided total 8-mers of genome sequences into 16 kinds of XY-type due to XY dinucleotides number in 8-mers. Previous works explored that the independent unimodal distributions observed only in three CG-type 8-mer spectra, while non-CG type 8-mer spectra have not the universal phenomenon from prokaryotes to eukaryotes. On this basis, we analyzed the distribution variation of non-CG type 8-mer spectra across 889 animal genome sequences. Following the evolutionary order of animals from primitive to more complex, we found that the spectrum distributions gradually transition from unimodal to tri-modal. The relative distance from the average frequency of each non-CG type 8-mers to the center frequency is different within a species and among different species. For the 8-mers contain CG dinucleotides, we further divided these into 16 subsets, where each 8-mer contains both CG and XY dinucleotides, called XY1_CG1 subsets. We found that the separability values of XY1_CG1 spectra are closely related to the evolution and specificity of animals. Considering the constraint of Chargaff's second parity rule, we finally obtained 10 separability values as the feature set to characterize the evolution state of genome sequences. In order to verify the rationality of the feature set, we used 14 common classification algorithms to perform binary classification tests. The results showed that the accuracy (Acc) ranged between 98.70% and 83.88% among birds, other vertebrates and mammals.

CONCLUSION

We proposed a credible feature set to characterizes the evolution state of genomes and obtained satisfied results by the feature set on large scale classification of animals.

摘要

背景

研究基因组序列的组成规则和演化机制是后基因组时代的核心问题,而基因组序列的 k-mer 频谱分析是解决这一问题的有效手段。

结果

我们根据 8-mer 中的 XY 二核苷酸数量将总 8-mer 分为 16 种 XY 型。以前的工作发现,只有在三种 CG 型 8-mer 谱中观察到独立的单峰分布,而在非 CG 型 8-mer 谱中,从原核生物到真核生物都没有普遍现象。在此基础上,我们分析了 889 种动物基因组序列中非 CG 型 8-mer 谱的分布变化。按照动物从原始到更复杂的进化顺序,我们发现谱分布逐渐从单峰过渡到三峰。每个非 CG 型 8-mer 的相对频率与中心频率的平均频率的距离在同一物种内和不同物种之间是不同的。对于包含 CG 二核苷酸的 8-mer,我们进一步将其分为 16 个子集,每个 8-mer 都包含 CG 和 XY 二核苷酸,称为 XY1_CG1 子集。我们发现,XY1_CG1 谱的可分离性值与动物的进化和特异性密切相关。考虑到Chargaff 第二碱基对规则的约束,我们最终得到了 10 个可分离性值作为特征集,以表征基因组序列的进化状态。为了验证特征集的合理性,我们使用 14 种常见的分类算法对二进制分类测试进行了分析。结果表明,在鸟类、其他脊椎动物和哺乳动物中,准确率(Acc)在 98.70%到 83.88%之间。

结论

我们提出了一个可信的特征集来描述基因组的进化状态,并通过该特征集在大规模的动物分类中获得了令人满意的结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4310/11391722/93917ca725e2/12864_2024_10786_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4310/11391722/de407a30d065/12864_2024_10786_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4310/11391722/fd5ba9bb7151/12864_2024_10786_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4310/11391722/a7c8aea70c3c/12864_2024_10786_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4310/11391722/abaa26fc433e/12864_2024_10786_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4310/11391722/15f0f9e80bfb/12864_2024_10786_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4310/11391722/93917ca725e2/12864_2024_10786_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4310/11391722/de407a30d065/12864_2024_10786_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4310/11391722/fd5ba9bb7151/12864_2024_10786_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4310/11391722/a7c8aea70c3c/12864_2024_10786_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4310/11391722/abaa26fc433e/12864_2024_10786_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4310/11391722/15f0f9e80bfb/12864_2024_10786_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4310/11391722/93917ca725e2/12864_2024_10786_Fig6_HTML.jpg

相似文献

1
Distribution rules of 8-mer spectra and characterization of evolution state in animal genome sequences.8 元组谱的分布规律及动物基因组序列进化状态的特征。
BMC Genomics. 2024 Sep 12;25(1):855. doi: 10.1186/s12864-024-10786-1.
2
Exploring objective feature sets in constructing the evolution relationship of animal genome sequences.探索构建动物基因组序列进化关系中的客观特征集。
BMC Genomics. 2023 Oct 24;24(1):634. doi: 10.1186/s12864-023-09747-x.
3
Intrinsic laws of k-mer spectra of genome sequences and evolution mechanism of genomes.基因组序列 k -mer 频谱的内在规律和基因组的进化机制。
BMC Evol Biol. 2020 Nov 23;20(1):157. doi: 10.1186/s12862-020-01723-3.
4
Evolutionary mechanism and biological functions of 8-mers containing CG dinucleotide in yeast.酵母中含CG二核苷酸的八聚体的进化机制及生物学功能
Chromosome Res. 2017 Jun;25(2):173-189. doi: 10.1007/s10577-017-9554-z. Epub 2017 Feb 9.
5
Spectrum structures and biological functions of 8-mers in the human genome.人类基因组中 8 聚体的谱结构和生物学功能。
Genomics. 2019 May;111(3):483-491. doi: 10.1016/j.ygeno.2018.03.006. Epub 2018 Mar 6.
6
Comparative analysis of DNA word abundances in four yeast genomes using a novel statistical background model.使用新型统计背景模型对四个酵母基因组中的 DNA 字频进行比较分析。
PLoS One. 2013;8(3):e58038. doi: 10.1371/journal.pone.0058038. Epub 2013 Mar 5.
7
Genome classification improvements based on k-mer intervals in sequences.基于序列中 k-mer 间隔的基因组分类改进。
Genomics. 2019 Dec;111(6):1574-1582. doi: 10.1016/j.ygeno.2018.11.001. Epub 2018 Nov 13.
8
K-mer natural vector and its application to the phylogenetic analysis of genetic sequences.K- -mer 自然向量及其在遗传序列系统发育分析中的应用。
Gene. 2014 Aug 1;546(1):25-34. doi: 10.1016/j.gene.2014.05.043. Epub 2014 May 22.
9
Rare k-mer DNA: Identification of sequence motifs and prediction of CpG island and promoter.稀有k-聚体DNA:序列基序的鉴定及CpG岛和启动子的预测
J Theor Biol. 2015 Dec 21;387:88-100. doi: 10.1016/j.jtbi.2015.09.014. Epub 2015 Sep 30.
10
Large-scale genomic 2D visualization reveals extensive CG-AT skew correlation in bird genomes.大规模基因组二维可视化揭示鸟类基因组中广泛的CG-AT偏斜相关性。
BMC Evol Biol. 2007 Nov 23;7:234. doi: 10.1186/1471-2148-7-234.

本文引用的文献

1
Exploring objective feature sets in constructing the evolution relationship of animal genome sequences.探索构建动物基因组序列进化关系中的客观特征集。
BMC Genomics. 2023 Oct 24;24(1):634. doi: 10.1186/s12864-023-09747-x.
2
AutoCoV: tracking the early spread of COVID-19 in terms of the spatial and temporal patterns from embedding space by K-mer based deep learning.AutoCoV:基于 K -mer 深度学习的嵌入空间追踪 COVID-19 时空模式的早期传播。
BMC Bioinformatics. 2022 Apr 25;23(Suppl 3):149. doi: 10.1186/s12859-022-04679-x.
3
PPred-PCKSM: A multi-layer predictor for identifying promoter and its variants using position based features.
PPred-PCKSM:一种基于位置特征的使用多层面预测器来识别启动子及其变体的方法。
Comput Biol Chem. 2022 Apr;97:107623. doi: 10.1016/j.compbiolchem.2022.107623. Epub 2022 Jan 7.
4
Inter-chromosomal k-mer distances.染色体间的 k-mer 距离。
BMC Genomics. 2021 Sep 6;22(1):644. doi: 10.1186/s12864-021-07952-0.
5
Exploration of hosts and transmission traits for SARS-CoV-2 based on the k-mer natural vector.基于 k -mer 自然载体探索 SARS-CoV-2 的宿主和传播特征。
Infect Genet Evol. 2021 Sep;93:104933. doi: 10.1016/j.meegid.2021.104933. Epub 2021 May 20.
6
A tail of two pandas- whole genome k-mer signature analysis of the red panda (Ailurus fulgens) and the Giant panda (Ailuropoda melanoleuca).两种熊猫的尾巴 - 红熊猫(Ailurus fulgens)和大熊猫(Ailuropoda melanoleuca)全基因组 k-mer 特征分析。
BMC Genomics. 2021 Apr 1;22(1):228. doi: 10.1186/s12864-021-07531-3.
7
iLearnPlus: a comprehensive and automated machine-learning platform for nucleic acid and protein sequence analysis, prediction and visualization.iLearnPlus:一个全面的、自动化的机器学习平台,用于核酸和蛋白质序列分析、预测和可视化。
Nucleic Acids Res. 2021 Jun 4;49(10):e60. doi: 10.1093/nar/gkab122.
8
Unique -mer sequences for validating cancer-related substitution, insertion and deletion mutations.用于验证癌症相关替换、插入和缺失突变的独特单链序列。
NAR Cancer. 2020 Dec;2(4):zcaa034. doi: 10.1093/narcan/zcaa034. Epub 2020 Dec 10.
9
Classification of Long Noncoding RNAs by k-mer Content.基于 k--mer 含量的长链非编码 RNA 分类。
Methods Mol Biol. 2021;2254:41-60. doi: 10.1007/978-1-0716-1158-6_4.
10
MirLocPredictor: A ConvNet-Based Multi-Label MicroRNA Subcellular Localization Predictor by Incorporating k-Mer Positional Information.MirLocPredictor:一种基于卷积神经网络的多标签 miRNA 亚细胞定位预测方法,通过整合 k-mer 位置信息。
Genes (Basel). 2020 Dec 9;11(12):1475. doi: 10.3390/genes11121475.