Libbrecht Maxwell W, Noble William Stafford
Department of Computer Science and Engineering, University of Washington, 185 Stevens Way, Seattle, Washington 98195-2350, USA.
1] Department of Computer Science and Engineering, University of Washington, 185 Stevens Way, Seattle, Washington 98195-2350, USA. [2] Department of Genome Sciences, University of Washington, 3720 15th Ave NE Seattle, Washington 98195-5065, USA.
Nat Rev Genet. 2015 Jun;16(6):321-32. doi: 10.1038/nrg3920. Epub 2015 May 7.
The field of machine learning, which aims to develop computer algorithms that improve with experience, holds promise to enable computers to assist humans in the analysis of large, complex data sets. Here, we provide an overview of machine learning applications for the analysis of genome sequencing data sets, including the annotation of sequence elements and epigenetic, proteomic or metabolomic data. We present considerations and recurrent challenges in the application of supervised, semi-supervised and unsupervised machine learning methods, as well as of generative and discriminative modelling approaches. We provide general guidelines to assist in the selection of these machine learning methods and their practical application for the analysis of genetic and genomic data sets.
机器学习领域旨在开发能随着经验提升的计算机算法,有望使计算机协助人类分析大型复杂数据集。在此,我们概述机器学习在基因组测序数据集分析中的应用,包括序列元件注释以及表观遗传、蛋白质组或代谢组数据。我们介绍了监督学习、半监督学习和无监督学习方法以及生成式和判别式建模方法应用中的注意事项和常见挑战。我们提供通用指南,以协助选择这些机器学习方法及其在遗传和基因组数据集分析中的实际应用。