Lichtarge Olivier, Yao Hui, Kristensen David M, Madabushi Srinivasan, Mihalek Ivana
Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA.
J Struct Funct Genomics. 2003;4(2-3):159-66. doi: 10.1023/a:1026115125950.
A common difficulty in post genomics biology is that large-scale techniques of data collection often strip away information on the biological context of these data. The result is a massive number of disconnected observations on sequence, structure, and function from which underlying patterns and biological meaning are obscured. One solution is to build computational filters that pick out sufficiently few facts, relevant to a query, that their relationship is immediately apparent and experimentally testable. Typically, these filters rely on mathematics and statistics, and on first principles from physics and chemistry. We show here that evolution itself can be used to filter sequence and structure data in order to identify evolutionarily important amino acids. A general property of these residues is that they form clusters in native protein structures and point to regions where mutations have the greatest biological impact. The result is an accurate method of functional site annotation that is scalable for structural proteomics.
后基因组生物学中的一个常见难题是,大规模的数据收集技术常常会剥离这些数据的生物学背景信息。其结果是产生了大量关于序列、结构和功能的不相关观测数据,其中潜在的模式和生物学意义被掩盖了。一种解决方案是构建计算过滤器,从中挑选出与查询相关的足够少的事实,使得它们之间的关系一目了然且可通过实验进行验证。通常,这些过滤器依赖于数学和统计学,以及物理和化学的基本原理。我们在此表明,进化本身可用于过滤序列和结构数据,以识别具有进化重要性的氨基酸。这些残基的一个普遍特性是,它们在天然蛋白质结构中形成簇,并指向突变具有最大生物学影响的区域。其结果是一种准确的功能位点注释方法,可用于结构蛋白质组学的扩展。