Cootes Adrian P, Muggleton Stephen H, Sternberg Michael J E
Cancer Research UK, Biomolecular Modelling Laboratory, 44 Lincoln's Inn Fields, London WC2A 3PX, UK.
J Mol Biol. 2003 Jul 18;330(4):839-50. doi: 10.1016/s0022-2836(03)00620-x.
The study of protein structure has been driven largely by the careful inspection of experimental data by human experts. However, the rapid determination of protein structures from structural-genomics projects will make it increasingly difficult to analyse (and determine the principles responsible for) the distribution of proteins in fold space by inspection alone. Here, we demonstrate a machine-learning strategy that automatically determines the structural principles describing 45 folds. The rules learnt were shown to be both statistically significant and meaningful to protein experts. With the increasing emphasis on high-throughput experimental initiatives, machine-learning and other automated methods of analysis will become increasingly important for many biological problems.
蛋白质结构的研究很大程度上是由人类专家对实验数据的仔细检查推动的。然而,从结构基因组学项目中快速确定蛋白质结构将使得仅通过检查来分析(并确定负责的原理)蛋白质在折叠空间中的分布变得越来越困难。在这里,我们展示了一种机器学习策略,它能自动确定描述45种折叠的结构原理。所学到的规则被证明在统计学上具有显著性,并且对蛋白质专家来说是有意义的。随着对高通量实验计划的日益重视,机器学习和其他自动化分析方法对于许多生物学问题将变得越来越重要。