Schneider Thomas D
National Cancer Institute at Frederick, Laboratory of Experimental and Computational Biology, P. O. Box B, Frederick, MD 21702-1201. (301) 846-5581 (-5532 for messages), fax: (301) 846-5598, email:
Biol Theory. 2006;1(3):250-260. doi: 10.1162/biot.2006.1.3.250.
A brief personal history is given about how information theory can be applied to binding sites of genetic control molecules on nucleic acids. The primary example used is ribosome binding sites in Escherichia coli. Once the sites are aligned, the information needed to describe the sites can be computed using Claude Shannon's method. This is displayed by a computer graphic called a sequence logo. The logo represents an average binding site, and the mathematics easily allows one to determine the components of this average. That is, given a set of binding sites, the information for individual binding sites can also be computed. One can go further and predict the information of sites that are not in the original data set. Information theory also allows one to model the flexibility of ribosome binding sites, and this led us to a simple model for ribosome translational initiation in which the molecular components fit together only when the ribosome is at a good ribosome binding site. Since information theory is general, the same mathematics applies to human splice junctions, where we can predict the effect of sequence changes that cause human genetic diseases and cancer. The second example given is the Pribnow 'box' which, when viewed by the information theory method, reveals a mechanism for initiation of both transcription and DNA replication. Replication, transcription, splicing, and translation into protein represent the central dogma, so these examples show how molecular information theory is contributing to our knowledge of basic biology.
本文简要介绍了个人经历,内容是关于信息论如何应用于核酸上遗传控制分子的结合位点。所使用的主要例子是大肠杆菌中的核糖体结合位点。一旦这些位点对齐,就可以使用克劳德·香农的方法计算描述这些位点所需的信息。这通过一种称为序列图标的计算机图形来展示。该图标代表一个平均结合位点,其数学原理能轻松让人确定这个平均值的组成部分。也就是说,给定一组结合位点,也可以计算单个结合位点的信息。人们还可以进一步预测原始数据集中不存在的位点的信息。信息论还能让人们对核糖体结合位点的灵活性进行建模,这使我们得出了一个关于核糖体翻译起始的简单模型,即只有当核糖体处于良好的核糖体结合位点时,分子成分才会结合在一起。由于信息论具有通用性,同样的数学原理适用于人类剪接位点,在那里我们可以预测导致人类遗传疾病和癌症的序列变化的影响。给出的第二个例子是普里布诺“框”,当用信息论方法观察时,它揭示了转录和DNA复制起始的一种机制。复制、转录、剪接以及翻译成蛋白质代表了中心法则,所以这些例子展示了分子信息论是如何增进我们对基础生物学的认识的。