Department of Bio and Health Informatics.
Department of Applied Mathematics and Computer Science, Technical University of Denmark, Lyngby, Denmark.
Bioinformatics. 2017 Nov 15;33(22):3685-3690. doi: 10.1093/bioinformatics/btx531.
Deep neural network architectures such as convolutional and long short-term memory networks have become increasingly popular as machine learning tools during the recent years. The availability of greater computational resources, more data, new algorithms for training deep models and easy to use libraries for implementation and training of neural networks are the drivers of this development. The use of deep learning has been especially successful in image recognition; and the development of tools, applications and code examples are in most cases centered within this field rather than within biology.
Here, we aim to further the development of deep learning methods within biology by providing application examples and ready to apply and adapt code templates. Given such examples, we illustrate how architectures consisting of convolutional and long short-term memory neural networks can relatively easily be designed and trained to state-of-the-art performance on three biological sequence problems: prediction of subcellular localization, protein secondary structure and the binding of peptides to MHC Class II molecules.
All implementations and datasets are available online to the scientific community at https://github.com/vanessajurtz/lasagne4bio.
Supplementary data are available at Bioinformatics online.
近年来,卷积神经网络和长短期记忆网络等深度神经网络架构作为机器学习工具变得越来越流行。更大的计算资源、更多的数据、用于训练深度模型的新算法以及易于使用的神经网络实现和训练库是推动这一发展的因素。深度学习在图像识别中特别成功;并且工具、应用程序和代码示例的开发通常集中在这个领域内,而不是生物学领域内。
在这里,我们旨在通过提供应用示例和可立即应用和改编的代码模板,在生物学中进一步开发深度学习方法。有了这样的例子,我们说明了如何相对容易地设计和训练由卷积和长短期记忆神经网络组成的架构,以达到三个生物学序列问题的最新性能:亚细胞定位、蛋白质二级结构和肽与 MHC 类 II 分子结合的预测。
所有实现和数据集都可在网上向科学界提供,网址为 https://github.com/vanessajurtz/lasagne4bio。
补充数据可在《生物信息学》在线获得。