• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于字典的信息基因组分析。

A dictionary based informational genome analysis.

机构信息

Department of Computer Science, Strada Le Grazie 15, 37134 Verona, Italy.

出版信息

BMC Genomics. 2012 Sep 17;13:485. doi: 10.1186/1471-2164-13-485.

DOI:10.1186/1471-2164-13-485
PMID:22985068
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3577435/
Abstract

BACKGROUND

In the post-genomic era several methods of computational genomics are emerging to understand how the whole information is structured within genomes. Literature of last five years accounts for several alignment-free methods, arisen as alternative metrics for dissimilarity of biological sequences. Among the others, recent approaches are based on empirical frequencies of DNA k-mers in whole genomes.

RESULTS

Any set of words (factors) occurring in a genome provides a genomic dictionary. About sixty genomes were analyzed by means of informational indexes based on genomic dictionaries, where a systemic view replaces a local sequence analysis. A software prototype applying a methodology here outlined carried out some computations on genomic data. We computed informational indexes, built the genomic dictionaries with different sizes, along with frequency distributions. The software performed three main tasks: computation of informational indexes, storage of these in a database, index analysis and visualization. The validation was done by investigating genomes of various organisms. A systematic analysis of genomic repeats of several lengths, which is of vivid interest in biology (for example to compute excessively represented functional sequences, such as promoters), was discussed, and suggested a method to define synthetic genetic networks.

CONCLUSIONS

We introduced a methodology based on dictionaries, and an efficient motif-finding software application for comparative genomics. This approach could be extended along many investigation lines, namely exported in other contexts of computational genomics, as a basis for discrimination of genomic pathologies.

摘要

背景

在后基因组时代,出现了几种计算基因组学方法,以了解整个信息如何在基因组中构建。过去五年的文献记载了几种无比对方法,这些方法作为生物序列相似性的替代度量标准出现。在其他方法中,最近的方法基于整个基因组中 DNA k-mer 的经验频率。

结果

基因组中出现的任何单词集(因子)都提供了基因组字典。通过基于基因组字典的信息指数对大约六十个基因组进行了分析,其中系统观点取代了局部序列分析。一个应用本文中概述的方法的软件原型对基因组数据进行了一些计算。我们计算了信息指数,构建了具有不同大小的基因组字典,以及频率分布。该软件执行了三个主要任务:计算信息指数、将这些指数存储在数据库中、指数分析和可视化。通过研究各种生物体的基因组来验证。讨论了对生物学非常感兴趣的各种长度的基因组重复的系统分析(例如,计算过度表示的功能序列,如启动子),并提出了一种定义合成遗传网络的方法。

结论

我们介绍了一种基于字典的方法和一种用于比较基因组学的高效模体发现软件应用程序。这种方法可以沿着许多研究方向扩展,即在计算基因组学的其他背景下扩展,作为区分基因组病理学的基础。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/62ae/3577435/e68ccff6483b/1471-2164-13-485-9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/62ae/3577435/84a2e49af3bf/1471-2164-13-485-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/62ae/3577435/a65927b893b5/1471-2164-13-485-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/62ae/3577435/e5ef53a8da68/1471-2164-13-485-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/62ae/3577435/c39efd9dc667/1471-2164-13-485-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/62ae/3577435/8e169bbd2caa/1471-2164-13-485-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/62ae/3577435/4536a9f0f9f0/1471-2164-13-485-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/62ae/3577435/9e6ce0f26641/1471-2164-13-485-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/62ae/3577435/275ff3908611/1471-2164-13-485-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/62ae/3577435/e68ccff6483b/1471-2164-13-485-9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/62ae/3577435/84a2e49af3bf/1471-2164-13-485-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/62ae/3577435/a65927b893b5/1471-2164-13-485-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/62ae/3577435/e5ef53a8da68/1471-2164-13-485-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/62ae/3577435/c39efd9dc667/1471-2164-13-485-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/62ae/3577435/8e169bbd2caa/1471-2164-13-485-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/62ae/3577435/4536a9f0f9f0/1471-2164-13-485-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/62ae/3577435/9e6ce0f26641/1471-2164-13-485-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/62ae/3577435/275ff3908611/1471-2164-13-485-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/62ae/3577435/e68ccff6483b/1471-2164-13-485-9.jpg

相似文献

1
A dictionary based informational genome analysis.基于字典的信息基因组分析。
BMC Genomics. 2012 Sep 17;13:485. doi: 10.1186/1471-2164-13-485.
2
A new method to compute K-mer frequencies and its application to annotate large repetitive plant genomes.一种计算K-mer频率的新方法及其在大型重复植物基因组注释中的应用。
BMC Genomics. 2008 Oct 31;9:517. doi: 10.1186/1471-2164-9-517.
3
Bioinformatics software for biologists in the genomics era.基因组学时代面向生物学家的生物信息学软件。
Bioinformatics. 2007 Jul 15;23(14):1713-7. doi: 10.1093/bioinformatics/btm239. Epub 2007 May 7.
4
Smash++: an alignment-free and memory-efficient tool to find genomic rearrangements.Smash++:一种无比对、节省内存的基因组重排分析工具。
Gigascience. 2020 May 1;9(5). doi: 10.1093/gigascience/giaa048.
5
Informational laws of genome structures.基因组结构的信息法则。
Sci Rep. 2016 Jun 29;6:28840. doi: 10.1038/srep28840.
6
Genome comparison without alignment using shortest unique substrings.使用最短唯一子串进行无需比对的基因组比较。
BMC Bioinformatics. 2005 May 23;6:123. doi: 10.1186/1471-2105-6-123.
7
Pan-Genome Storage and Analysis Techniques.泛基因组存储与分析技术
Methods Mol Biol. 2018;1704:29-53. doi: 10.1007/978-1-4939-7463-4_2.
8
GMAP and GSNAP for Genomic Sequence Alignment: Enhancements to Speed, Accuracy, and Functionality.用于基因组序列比对的GMAP和GSNAP:速度、准确性及功能的提升
Methods Mol Biol. 2016;1418:283-334. doi: 10.1007/978-1-4939-3578-9_15.
9
RIPCAL: a tool for alignment-based analysis of repeat-induced point mutations in fungal genomic sequences.RIPCAL:一种用于基于比对分析真菌基因组序列中重复诱导点突变的工具。
BMC Bioinformatics. 2008 Nov 12;9:478. doi: 10.1186/1471-2105-9-478.
10
Systematic prediction of cis-regulatory elements in the Chlamydomonas reinhardtii genome using comparative genomics.利用比较基因组学系统预测莱茵衣藻基因组中的顺式调控元件。
Plant Physiol. 2012 Oct;160(2):613-23. doi: 10.1104/pp.112.200840. Epub 2012 Aug 22.

引用本文的文献

1
Intrinsic laws of k-mer spectra of genome sequences and evolution mechanism of genomes.基因组序列 k -mer 频谱的内在规律和基因组的进化机制。
BMC Evol Biol. 2020 Nov 23;20(1):157. doi: 10.1186/s12862-020-01723-3.
2
Automated, Efficient, and Accelerated Knowledge Modeling of the Cognitive Neuroimaging Literature Using the ATHENA Toolkit.使用雅典娜工具包对认知神经影像学文献进行自动化、高效且加速的知识建模。
Front Neurosci. 2019 May 15;13:494. doi: 10.3389/fnins.2019.00494. eCollection 2019.
3
Evolutionary mechanism and biological functions of 8-mers containing CG dinucleotide in yeast.

本文引用的文献

1
Presenting ENCODE.展示ENCODE。
Nature. 2012 Sep 6;489(7414):45. doi: 10.1038/489045a.
2
Large-scale motif discovery using DNA Gray code and equiprobable oligomers.使用 DNA Gray 码和等概率寡聚物进行大规模基序发现。
Bioinformatics. 2012 Jan 1;28(1):25-31. doi: 10.1093/bioinformatics/btr606. Epub 2011 Nov 3.
3
Coding-independent regulation of the tumor suppressor PTEN by competing endogenous mRNAs.非编码 RNA 调控抑癌基因 PTEN 的表达。
酵母中含CG二核苷酸的八聚体的进化机制及生物学功能
Chromosome Res. 2017 Jun;25(2):173-189. doi: 10.1007/s10577-017-9554-z. Epub 2017 Feb 9.
4
Informational laws of genome structures.基因组结构的信息法则。
Sci Rep. 2016 Jun 29;6:28840. doi: 10.1038/srep28840.
5
Extracting DNA words based on the sequence features: non-uniform distribution and integrity.基于序列特征提取DNA单词:非均匀分布和完整性。
Theor Biol Med Model. 2016 Jan 25;13:2. doi: 10.1186/s12976-016-0028-3.
Cell. 2011 Oct 14;147(2):344-57. doi: 10.1016/j.cell.2011.09.029.
4
Creation of a bacterial cell controlled by a chemically synthesized genome.人工合成基因组控制的细菌细胞的创建。
Science. 2010 Jul 2;329(5987):52-6. doi: 10.1126/science.1190719. Epub 2010 May 20.
5
Genomic DNA k-mer spectra: models and modalities.基因组 DNA k--mer 频谱:模型与模态。
Genome Biol. 2009;10(10):R108. doi: 10.1186/gb-2009-10-10-r108. Epub 2009 Oct 8.
6
Barcodes for genomes and applications.基因组条形码及其应用。
BMC Bioinformatics. 2008 Dec 17;9:546. doi: 10.1186/1471-2105-9-546.
7
Absent sequences: nullomers and primes.缺失序列:零聚体和引物。
Pac Symp Biocomput. 2007:355-66. doi: 10.1142/9789812772435_0034.
8
Genome comparison without alignment using shortest unique substrings.使用最短唯一子串进行无需比对的基因组比较。
BMC Bioinformatics. 2005 May 23;6:123. doi: 10.1186/1471-2105-6-123.
9
Prokaryote phylogeny without sequence alignment: from avoidance signature to composition distance.无需序列比对的原核生物系统发育:从回避特征到组成距离
J Bioinform Comput Biol. 2004 Mar;2(1):1-19. doi: 10.1142/s0219720004000442.
10
How independent are the appearances of n-mers in different genomes?不同基因组中n聚体的出现情况有多独立?
Bioinformatics. 2004 Oct 12;20(15):2421-8. doi: 10.1093/bioinformatics/bth266. Epub 2004 Apr 15.