Suppr超能文献

基于卷积核网络的生物序列建模。

Biological sequence modeling with convolutional kernel networks.

机构信息

Université Grenoble Alpes, INRIA, CNRS, Grenoble INP, LJK, Grenoble, Isère France.

University of Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR 5558, Lyon, Rhône France.

出版信息

Bioinformatics. 2019 Sep 15;35(18):3294-3302. doi: 10.1093/bioinformatics/btz094.

Abstract

MOTIVATION

The growing number of annotated biological sequences available makes it possible to learn genotype-phenotype relationships from data with increasingly high accuracy. When large quantities of labeled samples are available for training a model, convolutional neural networks can be used to predict the phenotype of unannotated sequences with good accuracy. Unfortunately, their performance with medium- or small-scale datasets is mitigated, which requires inventing new data-efficient approaches.

RESULTS

We introduce a hybrid approach between convolutional neural networks and kernel methods to model biological sequences. Our method enjoys the ability of convolutional neural networks to learn data representations that are adapted to a specific task, while the kernel point of view yields algorithms that perform significantly better when the amount of training data is small. We illustrate these advantages for transcription factor binding prediction and protein homology detection, and we demonstrate that our model is also simple to interpret, which is crucial for discovering predictive motifs in sequences.

AVAILABILITY AND IMPLEMENTATION

Source code is freely available at https://gitlab.inria.fr/dchen/CKN-seq.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

越来越多的注释生物序列的出现使得从数据中学习基因型-表型关系成为可能,而且准确性越来越高。当有大量标记样本可供模型训练时,可以使用卷积神经网络以较高的准确度预测未注释序列的表型。不幸的是,当数据集规模中等或较小时,它们的性能会受到影响,这就需要发明新的数据高效方法。

结果

我们提出了一种卷积神经网络和核方法的混合方法来对生物序列建模。我们的方法既具有卷积神经网络学习适用于特定任务的数据表示的能力,又具有核方法在训练数据量较小时表现更好的算法。我们将这些优势应用于转录因子结合预测和蛋白质同源性检测,并证明我们的模型也易于解释,这对于在序列中发现预测基序至关重要。

可用性和实现

源代码可在 https://gitlab.inria.fr/dchen/CKN-seq 上免费获得。

补充信息

补充数据可在生物信息学在线获得。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验