Danziger Samuel A, Zeng Jue, Wang Ying, Brachmann Rainer K, Lathrop Richard H
Department of Biomedical Engineering, University of California, Irvine, California 92697, USA.
Bioinformatics. 2007 Jul 1;23(13):i104-14. doi: 10.1093/bioinformatics/btm166.
Many biomedical projects would benefit from reducing the time and expense of in vitro experimentation by using computer models for in silico predictions. These models may help determine which expensive biological data are most useful to acquire next. Active Learning techniques for choosing the most informative data enable biologists and computer scientists to optimize experimental data choices for rapid discovery of biological function. To explore design choices that affect this desirable behavior, five novel and five existing Active Learning techniques, together with three control methods, were tested on 57 previously unknown p53 cancer rescue mutants for their ability to build classifiers that predict protein function. The best of these techniques, Maximum Curiosity, improved the baseline accuracy of 56-77%. This article shows that Active Learning is a useful tool for biomedical research, and provides a case study of interest to others facing similar discovery challenges.
许多生物医学项目将受益于通过使用计算机模型进行计算机模拟预测来减少体外实验的时间和费用。这些模型可能有助于确定接下来获取哪些昂贵的生物学数据最为有用。用于选择最具信息性数据的主动学习技术使生物学家和计算机科学家能够优化实验数据选择,以快速发现生物学功能。为了探索影响这种理想行为的设计选择,对5种新颖的和5种现有的主动学习技术以及3种控制方法进行了测试,以考察它们针对57个先前未知的p53癌症拯救突变体构建预测蛋白质功能分类器的能力。其中最好的技术——最大好奇心,将基线准确率提高了56%-77%。本文表明主动学习是生物医学研究的一种有用工具,并为面临类似发现挑战的其他人提供了一个有趣的案例研究。