• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

将香农信息论应用于细菌、噬菌体基因组和宏基因组。

Applying Shannon's information theory to bacterial and phage genomes and metagenomes.

机构信息

Computational Science Research Center, San Diego State University, San Diego, CA, USA.

出版信息

Sci Rep. 2013;3:1033. doi: 10.1038/srep01033. Epub 2013 Jan 8.

DOI:10.1038/srep01033
PMID:23301154
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3539204/
Abstract

All sequence data contain inherent information that can be measured by Shannon's uncertainty theory. Such measurement is valuable in evaluating large data sets, such as metagenomic libraries, to prioritize their analysis and annotation, thus saving computational resources. Here, Shannon's index of complete phage and bacterial genomes was examined. The information content of a genome was found to be highly dependent on the genome length, GC content, and sequence word size. In metagenomic sequences, the amount of information correlated with the number of matches found by comparison to sequence databases. A sequence with more information (higher uncertainty) has a higher probability of being significantly similar to other sequences in the database. Measuring uncertainty may be used for rapid screening for sequences with matches in available database, prioritizing computational resources, and indicating which sequences with no known similarities are likely to be important for more detailed analysis.

摘要

所有的序列数据都包含可以通过香农不确定性理论进行测量的固有信息。这种测量在评估大型数据集(如宏基因组文库)时非常有价值,可以优先进行分析和注释,从而节省计算资源。在这里,我们检查了完整噬菌体和细菌基因组的香农指数。结果发现,基因组的信息量高度依赖于基因组长度、GC 含量和序列字大小。在宏基因组序列中,与序列数据库比较找到的匹配数量与信息量相关。信息量较大(不确定性较高)的序列与数据库中其他序列具有更高的相似性的可能性更大。测量不确定性可用于快速筛选与现有数据库中的匹配序列,优先分配计算资源,并指出哪些没有已知相似性的序列可能对更详细的分析很重要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ad93/3539204/47a94cc4555d/srep01033-f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ad93/3539204/e2d3bba9ec4b/srep01033-f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ad93/3539204/bb67bf2c8acc/srep01033-f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ad93/3539204/292f3de96c61/srep01033-f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ad93/3539204/f36416e1d53e/srep01033-f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ad93/3539204/47a94cc4555d/srep01033-f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ad93/3539204/e2d3bba9ec4b/srep01033-f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ad93/3539204/bb67bf2c8acc/srep01033-f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ad93/3539204/292f3de96c61/srep01033-f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ad93/3539204/f36416e1d53e/srep01033-f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ad93/3539204/47a94cc4555d/srep01033-f5.jpg

相似文献

1
Applying Shannon's information theory to bacterial and phage genomes and metagenomes.将香农信息论应用于细菌、噬菌体基因组和宏基因组。
Sci Rep. 2013;3:1033. doi: 10.1038/srep01033. Epub 2013 Jan 8.
2
Phage hunters: Computational strategies for finding phages in large-scale 'omics datasets.噬菌体猎人:在大规模组学数据集中寻找噬菌体的计算策略。
Virus Res. 2018 Jan 15;244:110-115. doi: 10.1016/j.virusres.2017.10.019. Epub 2017 Nov 1.
3
MinION™ nanopore sequencing of environmental metagenomes: a synthetic approach.环境宏基因组的MinION™纳米孔测序:一种合成方法。
Gigascience. 2017 Mar 1;6(3):1-10. doi: 10.1093/gigascience/gix007.
4
Reconstruction of novel cyanobacterial siphovirus genomes from Mediterranean metagenomic fosmids.从地中海宏基因组 fosmid 中重建新型蓝藻虹彩病毒基因组。
Appl Environ Microbiol. 2013 Jan;79(2):688-95. doi: 10.1128/AEM.02742-12. Epub 2012 Nov 16.
5
Eu-Detect: an algorithm for detecting eukaryotic sequences in metagenomic data sets.Eu-Detect:一种用于在宏基因组数据集检测真核序列的算法。
J Biosci. 2011 Sep;36(4):709-17. doi: 10.1007/s12038-011-9105-2.
6
Microbial Diversity and Phage-Host Interactions in the Georgian Coastal Area of the Black Sea Revealed by Whole Genome Metagenomic Sequencing.通过全基因组宏基因组测序揭示黑海格鲁吉亚沿海地区的微生物多样性和噬菌体-宿主相互作用。
Mar Drugs. 2020 Nov 14;18(11):558. doi: 10.3390/md18110558.
7
Metagenomics uncovers a new group of low GC and ultra-small marine Actinobacteria.宏基因组学揭示了一类新的低 GC 及超小型海洋放线菌。
Sci Rep. 2013;3:2471. doi: 10.1038/srep02471.
8
Evaluation of a transposase protocol for rapid generation of shotgun high-throughput sequencing libraries from nanogram quantities of DNA.评估一种转座酶方案,用于从纳克数量的 DNA 快速生成 shotgun 高通量测序文库。
Appl Environ Microbiol. 2011 Nov;77(22):8071-9. doi: 10.1128/AEM.05610-11. Epub 2011 Sep 23.
9
Gene identification in prokaryotic genomes, phages, metagenomes, and EST sequences with GeneMarkS suite.使用GeneMarkS套件在原核生物基因组、噬菌体、宏基因组和EST序列中进行基因识别。
Curr Protoc Bioinformatics. 2011 Sep;Chapter 4:4.5.1-4.5.17. doi: 10.1002/0471250953.bi0405s35.
10
Isolation of a Host-Confined Phage Metagenome Allows the Detection of Phages Both Capable and Incapable of Plaque Formation.宿主限制噬菌体宏基因组的分离允许检测既能形成噬菌斑又不能形成噬菌斑的噬菌体。
Methods Mol Biol. 2023;2555:195-203. doi: 10.1007/978-1-0716-2795-2_14.

引用本文的文献

1
Informational rescaling of PCA maps with application to genetic distance.主成分分析图的信息重缩放及其在遗传距离中的应用。
Comput Struct Biotechnol J. 2024 Dec 11;27:48-56. doi: 10.1016/j.csbj.2024.11.042. eCollection 2025.
2
Information Theory for Biological Sequence Classification: A Novel Feature Extraction Technique Based on Tsallis Entropy.用于生物序列分类的信息论:一种基于Tsallis熵的新型特征提取技术。
Entropy (Basel). 2022 Oct 1;24(10):1398. doi: 10.3390/e24101398.
3
MathFeature: feature extraction package for DNA, RNA and protein sequences based on mathematical descriptors.

本文引用的文献

1
SEED servers: high-performance access to the SEED genomes, annotations, and metabolic models.SEED 服务器:高性能访问 SEED 基因组、注释和代谢模型。
PLoS One. 2012;7(10):e48053. doi: 10.1371/journal.pone.0048053. Epub 2012 Oct 24.
2
Entropy involved in fidelity of DNA replication.DNA 复制保真度所涉及的熵。
PLoS One. 2012;7(8):e42272. doi: 10.1371/journal.pone.0042272. Epub 2012 Aug 9.
3
A selective force favoring increased G+C content in bacterial genes.一种有利于增加细菌基因中 G+C 含量的选择压力。
MathFeature:基于数学描述符的 DNA、RNA 和蛋白质序列特征提取包。
Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab434.
4
Information Entropy in Chemistry: An Overview.化学中的信息熵:综述
Entropy (Basel). 2021 Sep 23;23(10):1240. doi: 10.3390/e23101240.
5
Shannon entropy as a metric for conditional gene expression in Neurospora crassa.香农熵作为衡量粗糙脉孢菌条件基因表达的指标。
G3 (Bethesda). 2021 Apr 15;11(4). doi: 10.1093/g3journal/jkab055.
6
Rapid discovery of novel prophages using biological feature engineering and machine learning.利用生物特征工程和机器学习快速发现新型原噬菌体
NAR Genom Bioinform. 2021 Jan 6;3(1):lqaa109. doi: 10.1093/nargab/lqaa109. eCollection 2021 Mar.
7
Similarity Studies of Corona Viruses through Chaos Game Representation.通过混沌游戏表示法对冠状病毒的相似性研究
Comput Mol Biosci. 2020 Sep;10(3):61-72. doi: 10.4236/cmb.2020.103004.
8
Kullback Leibler divergence in complete bacterial and phage genomes.完整细菌和噬菌体基因组中的库尔贝克-莱布勒散度
PeerJ. 2017 Nov 30;5:e4026. doi: 10.7717/peerj.4026. eCollection 2017.
9
Recovering complete and draft population genomes from metagenome datasets.从宏基因组数据集中恢复完整和草图的种群基因组。
Microbiome. 2016 Mar 8;4:8. doi: 10.1186/s40168-016-0154-5.
10
Relationship between digital information and thermodynamic stability in bacterial genomes.细菌基因组中数字信息与热力学稳定性之间的关系。
EURASIP J Bioinform Syst Biol. 2016 Feb 2;2016(1):4. doi: 10.1186/s13637-016-0037-x. eCollection 2016 Dec.
Proc Natl Acad Sci U S A. 2012 Sep 4;109(36):14504-7. doi: 10.1073/pnas.1205683109. Epub 2012 Aug 20.
4
PhiSpy: a novel algorithm for finding prophages in bacterial genomes that combines similarity- and composition-based strategies.PhiSpy:一种新型算法,用于在细菌基因组中寻找噬菌体,该算法结合了基于相似性和组成的策略。
Nucleic Acids Res. 2012 Sep;40(16):e126. doi: 10.1093/nar/gks406. Epub 2012 May 14.
5
Quality control and preprocessing of metagenomic datasets.宏基因组数据集的质量控制和预处理。
Bioinformatics. 2011 Mar 15;27(6):863-4. doi: 10.1093/bioinformatics/btr026. Epub 2011 Jan 28.
6
The Genomes On Line Database (GOLD) in 2009: status of genomic and metagenomic projects and their associated metadata.《基因组在线数据库(GOLD)》2009 年报告:基因组和宏基因组项目及其相关元数据的现状。
Nucleic Acids Res. 2010 Jan;38(Database issue):D346-54. doi: 10.1093/nar/gkp848. Epub 2009 Nov 13.
7
GenBank.GenBank。
Nucleic Acids Res. 2010 Jan;38(Database issue):D46-51. doi: 10.1093/nar/gkp1024. Epub 2009 Nov 12.
8
Inverse symmetry in complete genomes and whole-genome inverse duplication.完整基因组中的反称性和全基因组反向重复。
PLoS One. 2009 Nov 9;4(11):e7553. doi: 10.1371/journal.pone.0007553.
9
Hidden chromosome symmetry: in silico transformation reveals symmetry in 2D DNA walk trajectories of 671 chromosomes.隐藏的染色体对称性:计算机模拟转换揭示了671条染色体二维DNA行走轨迹中的对称性。
PLoS One. 2009 Jul 28;4(7):e6396. doi: 10.1371/journal.pone.0006396.
10
The metagenomics RAST server - a public resource for the automatic phylogenetic and functional analysis of metagenomes.宏基因组学RAST服务器——用于宏基因组自动系统发育和功能分析的公共资源。
BMC Bioinformatics. 2008 Sep 19;9:386. doi: 10.1186/1471-2105-9-386.