基于卷积核网络的生物序列建模。

Biological sequence modeling with convolutional kernel networks.

机构信息

Université Grenoble Alpes, INRIA, CNRS, Grenoble INP, LJK, Grenoble, Isère France.

University of Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR 5558, Lyon, Rhône France.

出版信息

Bioinformatics. 2019 Sep 15;35(18):3294-3302. doi: 10.1093/bioinformatics/btz094.

DOI:10.1093/bioinformatics/btz094

PMID:30753280

Abstract

MOTIVATION

The growing number of annotated biological sequences available makes it possible to learn genotype-phenotype relationships from data with increasingly high accuracy. When large quantities of labeled samples are available for training a model, convolutional neural networks can be used to predict the phenotype of unannotated sequences with good accuracy. Unfortunately, their performance with medium- or small-scale datasets is mitigated, which requires inventing new data-efficient approaches.

RESULTS

We introduce a hybrid approach between convolutional neural networks and kernel methods to model biological sequences. Our method enjoys the ability of convolutional neural networks to learn data representations that are adapted to a specific task, while the kernel point of view yields algorithms that perform significantly better when the amount of training data is small. We illustrate these advantages for transcription factor binding prediction and protein homology detection, and we demonstrate that our model is also simple to interpret, which is crucial for discovering predictive motifs in sequences.

AVAILABILITY AND IMPLEMENTATION

Source code is freely available at https://gitlab.inria.fr/dchen/CKN-seq.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

越来越多的注释生物序列的出现使得从数据中学习基因型-表型关系成为可能，而且准确性越来越高。当有大量标记样本可供模型训练时，可以使用卷积神经网络以较高的准确度预测未注释序列的表型。不幸的是，当数据集规模中等或较小时，它们的性能会受到影响，这就需要发明新的数据高效方法。

结果

我们提出了一种卷积神经网络和核方法的混合方法来对生物序列建模。我们的方法既具有卷积神经网络学习适用于特定任务的数据表示的能力，又具有核方法在训练数据量较小时表现更好的算法。我们将这些优势应用于转录因子结合预测和蛋白质同源性检测，并证明我们的模型也易于解释，这对于在序列中发现预测基序至关重要。

可用性和实现

源代码可在 https://gitlab.inria.fr/dchen/CKN-seq 上免费获得。

补充信息

补充数据可在生物信息学在线获得。

相似文献

Biological sequence modeling with convolutional kernel networks.基于卷积核网络的生物序列建模。

Bioinformatics. 2019 Sep 15;35(18):3294-3302. doi: 10.1093/bioinformatics/btz094.

Predicting RNA-protein binding sites and motifs through combining local and global deep convolutional neural networks.通过结合局部和全局深度卷积神经网络预测 RNA 与蛋白质的结合位点和基序。

Bioinformatics. 2018 Oct 15;34(20):3427-3436. doi: 10.1093/bioinformatics/bty364.

Neural networks with circular filters enable data efficient inference of sequence motifs.具有循环滤波器的神经网络能够实现对序列基序的数据高效推断。

Bioinformatics. 2019 Oct 15;35(20):3937-3943. doi: 10.1093/bioinformatics/btz194.

Efficient implementation of convolutional neural networks in the data processing of two-photon in vivo imaging.卷积神经网络在双光子体内成像数据处理中的高效实现。

Bioinformatics. 2019 Sep 1;35(17):3208-3210. doi: 10.1093/bioinformatics/btz055.

pysster: classification of biological sequences by learning sequence and structure motifs with convolutional neural networks.pysster：通过使用卷积神经网络学习序列和结构基元对生物序列进行分类。

Bioinformatics. 2018 Sep 1;34(17):3035-3037. doi: 10.1093/bioinformatics/bty222.

Simple tricks of convolutional neural network architectures improve DNA-protein binding prediction.卷积神经网络架构的简单技巧可提高 DNA-蛋白质结合预测。

Bioinformatics. 2019 Jun 1;35(11):1837-1843. doi: 10.1093/bioinformatics/bty893.

An efficient approach based on multi-sources information to predict circRNA-disease associations using deep convolutional neural network.基于多源信息的深度学习卷积神经网络预测 circRNA 疾病关联的有效方法。

Bioinformatics. 2020 Jul 1;36(13):4038-4046. doi: 10.1093/bioinformatics/btz825.

LZW-Kernel: fast kernel utilizing variable length code blocks from LZW compressors for protein sequence classification.LZW-Kernel：快速内核，利用 LZW 压缩器中的变长码块对蛋白质序列进行分类。

Bioinformatics. 2018 Oct 1;34(19):3281-3288. doi: 10.1093/bioinformatics/bty349.

Chromatin accessibility prediction via a hybrid deep convolutional neural network.基于混合深度卷积神经网络的染色质可及性预测。

Bioinformatics. 2018 Mar 1;34(5):732-738. doi: 10.1093/bioinformatics/btx679.

FastSK: fast sequence analysis with gapped string kernels.FastSK：使用带间隙字符串核的快速序列分析。

Bioinformatics. 2020 Dec 30;36(Suppl_2):i857-i865. doi: 10.1093/bioinformatics/btaa817.

引用本文的文献

WaveSeekerNet: accurate prediction of influenza A virus subtypes and host source using attention-based deep learning.WaveSeekerNet：基于注意力机制的深度学习对甲型流感病毒亚型和宿主来源的准确预测

Gigascience. 2025 Jan 6;14. doi: 10.1093/gigascience/giaf089.

Inherently interpretable position-aware convolutional motif kernel networks for biological sequencing data.用于生物测序数据的固有可解释位置感知卷积基元核网络。

Sci Rep. 2023 Oct 11;13(1):17216. doi: 10.1038/s41598-023-44175-7.

COmic: convolutional kernel networks for interpretable end-to-end learning on (multi-)omics data.漫画：卷积核网络在（多）组学数据上进行可解释的端到端学习。

Bioinformatics. 2023 Jun 30;39(39 Suppl 1):i76-i85. doi: 10.1093/bioinformatics/btad204.

Genomics enters the deep learning era.基因组学进入深度学习时代。

PeerJ. 2022 Jun 24;10:e13613. doi: 10.7717/peerj.13613. eCollection 2022.

Feature selection for kernel methods in systems biology.系统生物学中核方法的特征选择

NAR Genom Bioinform. 2022 Mar 7;4(1):lqac014. doi: 10.1093/nargab/lqac014. eCollection 2022 Mar.

Application of deep learning in genomics.深度学习在基因组学中的应用。

Sci China Life Sci. 2020 Dec;63(12):1860-1878. doi: 10.1007/s11427-020-1804-5. Epub 2020 Oct 10.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于卷积核网络的生物序列建模。

Biological sequence modeling with convolutional kernel networks.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现

补充信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献