• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

单个基因序列的信息内容。

Information content of individual genetic sequences.

作者信息

Schneider T D

机构信息

National Cancer Institute, Frederick Cancer Research and Development Center, Laboratory of Mathematical Biology, P.O. Box B, Frederick, MD 21702-1201, USA.

出版信息

J Theor Biol. 1997 Dec 21;189(4):427-41. doi: 10.1006/jtbi.1997.0540.

DOI:10.1006/jtbi.1997.0540
PMID:9446751
Abstract

Related genetic sequences having a common function can be described by Shannon's information measure and depicted graphically by a sequence logo. Though useful for many purposes, sequence logos only show the average sequence conservation, and inferring the conservation for individual sequences is difficult. This limitation is overcome by the individual information ( R i) technique described here. The method begins by generating a weight matrix from the frequencies of each nucleotide or amino acid at each position of the aligned sequences. This matrix is then applied to the sequences themselves to determine the sequence conservation of each individual sequence. The matrix is unique because the average of these assignments is the total sequence conservation, ad there is only one way to construct such a matrix. For binding sites on polynucleotides, the weight matrix has a natural cut off that distinguishes functional sequences from other sequences. R i values are on an absolute scale measured in bits of information so the conservation of different biological functions can be compared with one another. The matrix can be used to rank-order the sequences, to search for new sequences, to compare sequences to other quantitative data such as binding energy or distance between binding sites, to distinguish mutations from polymorphisms, to design sequences of a given strength, and to detect errors in databases. The R i method has been used to identify previously undescribed but experimentally verified DNA binding sites. The individual information distribution was determined for E. coli ribosome binding sites, bacterial Fis binding sites, and human donor and acceptor splice junctions, among others. The distributions demonstrate clearly that the consensus sequence is highly unusual, and hence is a poor method to describe naturally occurring binding sites.

摘要

具有共同功能的相关基因序列可用香农信息测度来描述,并用序列标识以图形方式呈现。序列标识虽在许多方面有用,但仅显示平均序列保守性,难以推断单个序列的保守性。本文所述的个体信息(Ri)技术克服了这一局限性。该方法首先根据比对序列中每个位置上每种核苷酸或氨基酸的频率生成一个权重矩阵。然后将此矩阵应用于序列本身,以确定每个单独序列的序列保守性。该矩阵是唯一的,因为这些赋值的平均值就是总序列保守性,而且构建这样一个矩阵只有一种方法。对于多核苷酸上的结合位点,权重矩阵有一个自然的截止值,可将功能序列与其他序列区分开来。Ri值是以信息比特为单位的绝对尺度,因此可以相互比较不同生物学功能的保守性。该矩阵可用于对序列进行排序、搜索新序列、将序列与其他定量数据(如结合能或结合位点之间的距离)进行比较、区分突变与多态性、设计给定强度的序列以及检测数据库中的错误。Ri方法已用于识别先前未描述但经实验验证的DNA结合位点。已确定了大肠杆菌核糖体结合位点、细菌Fis结合位点以及人类供体和受体剪接位点等的个体信息分布。这些分布清楚地表明,共有序列非常不寻常,因此是描述天然存在的结合位点的一种糟糕方法。

相似文献

1
Information content of individual genetic sequences.单个基因序列的信息内容。
J Theor Biol. 1997 Dec 21;189(4):427-41. doi: 10.1006/jtbi.1997.0540.
2
Anatomy of Escherichia coli ribosome binding sites.大肠杆菌核糖体结合位点的剖析。
J Mol Biol. 2001 Oct 12;313(1):215-28. doi: 10.1006/jmbi.2001.5040.
3
Using information content and base frequencies to distinguish mutations from genetic polymorphisms in splice junction recognition sites.利用信息含量和碱基频率区分剪接连接识别位点的突变与基因多态性。
Hum Mutat. 1995;6(1):74-6. doi: 10.1002/humu.1380060114.
4
[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].[通过新型人类基因的电子克隆和实验验证对NCBI人类基因数据库中出现的模型参考序列的一些错误进行分析、鉴定和校正]
Yi Chuan Xue Bao. 2004 May;31(5):431-43.
5
Study of DNA binding sites using the Rényi parametric entropy measure.使用雷尼参数熵测度研究DNA结合位点。
J Theor Biol. 2004 Apr 7;227(3):429-36. doi: 10.1016/j.jtbi.2003.11.026.
6
Information analysis of human splice site mutations.人类剪接位点突变的信息分析
Hum Mutat. 1998;12(3):153-71. doi: 10.1002/(SICI)1098-1004(1998)12:3<153::AID-HUMU3>3.0.CO;2-I.
7
Variation in structural location and amino acid conservation of functional sites in protein domain families.蛋白质结构域家族中功能位点的结构位置和氨基酸保守性的变异
BMC Bioinformatics. 2005 Aug 25;6:210. doi: 10.1186/1471-2105-6-210.
8
An integrated approach to the analysis and modeling of protein sequences and structures. III. A comparative study of sequence conservation in protein structural families using multiple structural alignments.一种蛋白质序列与结构分析及建模的综合方法。III. 使用多重结构比对对蛋白质结构家族中的序列保守性进行比较研究。
J Mol Biol. 2000 Aug 18;301(3):691-711. doi: 10.1006/jmbi.2000.3975.
9
Information theory analysis of the relationship between primary sequence structure and ligand recognition among a class of facilitated transporters.一类易化转运体中一级序列结构与配体识别之间关系的信息论分析
J Theor Biol. 1995 May 21;174(2):179-88. doi: 10.1006/jtbi.1995.0090.
10
Examination of the transcription factor NtcA-binding motif by in vitro selection of DNA sequences from a random library.通过从随机文库中体外筛选DNA序列对转录因子NtcA结合基序进行检测。
J Mol Biol. 2000 Aug 25;301(4):783-93. doi: 10.1006/jmbi.2000.4000.

引用本文的文献

1
Differential expression of the operon in a biofilm.操纵子在生物膜中的差异表达。
Appl Environ Microbiol. 2024 Nov 20;90(11):e0136224. doi: 10.1128/aem.01362-24. Epub 2024 Oct 22.
2
Genotyping Hepatitis B virus by Next-Generation Sequencing: Detection of Mixed Infections and Analysis of Sequence Conservation.基于下一代测序的乙型肝炎病毒基因分型:混合感染检测及序列保守性分析。
Int J Mol Sci. 2024 May 17;25(10):5481. doi: 10.3390/ijms25105481.
3
Language model enables end-to-end accurate detection of cancer from cell-free DNA.
语言模型可实现从游离 DNA 端到端准确检测癌症。
Brief Bioinform. 2024 Jan 22;25(2). doi: 10.1093/bib/bbae053.
4
Exploring the mono-/bistability range of positively autoregulated signaling systems in the presence of competing transcription factor binding sites.探究存在竞争转录因子结合位点时正反馈调节信号系统的单稳/双稳范围。
PLoS Comput Biol. 2022 Nov 22;18(11):e1010738. doi: 10.1371/journal.pcbi.1010738. eCollection 2022 Nov.
5
The CAR-mRNA Interaction Surface Is a Zipper Extension of the Ribosome A Site.CAR-mRNA 相互作用表面是核糖体 A 位的拉链延伸。
Int J Mol Sci. 2022 Jan 26;23(3):1417. doi: 10.3390/ijms23031417.
6
Interpretable prioritization of splice variants in diagnostic next-generation sequencing.可解释的剪接变异体优先排序在诊断下一代测序中。
Am J Hum Genet. 2021 Sep 2;108(9):1564-1577. doi: 10.1016/j.ajhg.2021.06.014. Epub 2021 Jul 21.
7
Information theoretic perspective on genome clustering.基因组聚类的信息论视角
Saudi J Biol Sci. 2021 Mar;28(3):1867-1889. doi: 10.1016/j.sjbs.2020.12.039. Epub 2020 Dec 31.
8
Molecular Mechanisms of Phosphate Sensing, Transport and Signalling in Streptomyces and Related Actinobacteria.链霉菌及相关放线菌中磷酸盐感应、转运和信号转导的分子机制
Int J Mol Sci. 2021 Jan 23;22(3):1129. doi: 10.3390/ijms22031129.
9
Restriction enzymes use a 24 dimensional coding space to recognize 6 base long DNA sequences.限制酶使用 24 维编码空间来识别 6 个碱基长的 DNA 序列。
PLoS One. 2019 Oct 31;14(10):e0222419. doi: 10.1371/journal.pone.0222419. eCollection 2019.
10
Activation of Secondary Metabolite Gene Clusters in by the PimM Regulator of .由皮疽诺卡氏菌的PimM调节因子激活皮疽诺卡氏菌中次生代谢物基因簇
Front Microbiol. 2019 Mar 26;10:580. doi: 10.3389/fmicb.2019.00580. eCollection 2019.