• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

感知:用于序列嵌入和无比对的暹罗神经网络。

SENSE: Siamese neural network for sequence embedding and alignment-free comparison.

机构信息

Department of Computer Science and Engineering, University at Buffalo, The State University of New York, Buffalo, NY, USA.

Department of Oral Biology, University at Buffalo, The State University of New York, Buffalo, NY, USA.

出版信息

Bioinformatics. 2019 Jun 1;35(11):1820-1828. doi: 10.1093/bioinformatics/bty887.

DOI:10.1093/bioinformatics/bty887
PMID:30346493
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7963080/
Abstract

MOTIVATION

Sequence analysis is arguably a foundation of modern biology. Classic approaches to sequence analysis are based on sequence alignment, which is limited when dealing with large-scale sequence data. A dozen of alignment-free approaches have been developed to provide computationally efficient alternatives to alignment-based approaches. However, existing methods define sequence similarity based on various heuristics and can only provide rough approximations to alignment distances.

RESULTS

In this article, we developed a new approach, referred to as SENSE (SiamEse Neural network for Sequence Embedding), for efficient and accurate alignment-free sequence comparison. The basic idea is to use a deep neural network to learn an explicit embedding function based on a small training dataset to project sequences into an embedding space so that the mean square error between alignment distances and pairwise distances defined in the embedding space is minimized. To the best of our knowledge, this is the first attempt to use deep learning for alignment-free sequence analysis. A large-scale experiment was performed that demonstrated that our method significantly outperformed the state-of-the-art alignment-free methods in terms of both efficiency and accuracy.

AVAILABILITY AND IMPLEMENTATION

Open-source software for the proposed method is developed and freely available at https://www.acsu.buffalo.edu/∼yijunsun/lab/SENSE.html.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

序列分析可以说是现代生物学的基础。经典的序列分析方法基于序列比对,而当处理大规模序列数据时,这种方法存在局限性。已经开发了十几种无比对方法,为基于比对的方法提供了计算效率更高的替代方法。然而,现有的方法基于各种启发式方法来定义序列相似性,并且只能对比对距离提供粗略的近似。

结果

在本文中,我们开发了一种新的方法,称为 SENSE(基于暹罗 Ese 神经网络的序列嵌入),用于高效准确的无比对序列比较。基本思想是使用深度神经网络基于小的训练数据集学习显式嵌入函数,将序列投影到嵌入空间中,使得在嵌入空间中定义的比对距离和成对距离之间的均方误差最小化。据我们所知,这是首次尝试将深度学习用于无比对序列分析。进行了大规模实验,结果表明,我们的方法在效率和准确性方面均显著优于最先进的无比对方法。

可用性和实现

拟议方法的开源软件已开发完成,并可在 https://www.acsu.buffalo.edu/∼yijunsun/lab/SENSE.html 上免费获得。

补充信息

补充数据可在生物信息学在线获得。

相似文献

1
SENSE: Siamese neural network for sequence embedding and alignment-free comparison.感知:用于序列嵌入和无比对的暹罗神经网络。
Bioinformatics. 2019 Jun 1;35(11):1820-1828. doi: 10.1093/bioinformatics/bty887.
2
Alignment-free comparison of metagenomics sequences via approximate string matching.通过近似字符串匹配对宏基因组序列进行无比对比较。
Bioinform Adv. 2022 Oct 21;2(1):vbac077. doi: 10.1093/bioadv/vbac077. eCollection 2022.
3
A parallel computational framework for ultra-large-scale sequence clustering analysis.一种用于超大规模序列聚类分析的并行计算框架。
Bioinformatics. 2019 Feb 1;35(3):380-388. doi: 10.1093/bioinformatics/bty617.
4
K2 and K2*: efficient alignment-free sequence similarity measurement based on Kendall statistics.K2 和 K2*:基于 Kendall 统计量的高效无对齐序列相似性度量。
Bioinformatics. 2018 May 15;34(10):1682-1689. doi: 10.1093/bioinformatics/btx809.
5
Deep-learning approach to identifying cancer subtypes using high-dimensional genomic data.深度学习方法通过高维基因组数据识别癌症亚型。
Bioinformatics. 2020 Mar 1;36(5):1476-1483. doi: 10.1093/bioinformatics/btz769.
6
ESPRIT-Forest: Parallel clustering of massive amplicon sequence data in subquadratic time.ESPRIT-Forest:在亚二次时间内对海量扩增子序列数据进行并行聚类
PLoS Comput Biol. 2017 Apr 24;13(4):e1005518. doi: 10.1371/journal.pcbi.1005518. eCollection 2017 Apr.
7
Fuse: multiple network alignment via data fusion.Fuse:通过数据融合进行多重网络比对。
Bioinformatics. 2016 Apr 15;32(8):1195-203. doi: 10.1093/bioinformatics/btv731. Epub 2015 Dec 14.
8
CMsearch: simultaneous exploration of protein sequence space and structure space improves not only protein homology detection but also protein structure prediction.CMsearch:同时探索蛋白质序列空间和结构空间不仅能改善蛋白质同源性检测,还能提升蛋白质结构预测。
Bioinformatics. 2016 Jun 15;32(12):i332-i340. doi: 10.1093/bioinformatics/btw271.
9
ResPRE: high-accuracy protein contact prediction by coupling precision matrix with deep residual neural networks.ResPRE:通过结合精度矩阵和深度残差神经网络进行高精度蛋白质接触预测。
Bioinformatics. 2019 Nov 1;35(22):4647-4655. doi: 10.1093/bioinformatics/btz291.
10
DeepSF: deep convolutional neural network for mapping protein sequences to folds.DeepSF:一种将蛋白质序列映射到折叠结构的深度卷积神经网络。
Bioinformatics. 2018 Apr 15;34(8):1295-1303. doi: 10.1093/bioinformatics/btx780.

引用本文的文献

1
The grand biological universe: A comprehensive geometric construction of genome space.宏大的生物宇宙:基因组空间的全面几何构建
Innovation (Camb). 2025 Apr 30;6(8):100937. doi: 10.1016/j.xinn.2025.100937. eCollection 2025 Aug 4.
2
Enhancing nucleotide sequence representations in genomic analysis with contrastive optimization.通过对比优化增强基因组分析中的核苷酸序列表示。
Commun Biol. 2025 Mar 29;8(1):517. doi: 10.1038/s42003-025-07902-6.
3
Visualization Methods for DNA Sequences: A Review and Prospects.DNA 序列的可视化方法:综述与展望。
Biomolecules. 2024 Nov 14;14(11):1447. doi: 10.3390/biom14111447.
4
Learning locality-sensitive bucketing functions.学习位置敏感的分桶函数。
Bioinformatics. 2024 Jun 28;40(Suppl 1):i318-i327. doi: 10.1093/bioinformatics/btae228.
5
Mottle: Accurate pairwise substitution distance at high divergence through the exploitation of short-read mappers and gradient descent.斑驳:通过利用短读映射器和梯度下降实现高分歧下精确的双序列替换距离。
PLoS One. 2024 Mar 21;19(3):e0298834. doi: 10.1371/journal.pone.0298834. eCollection 2024.
6
Prediction of the interaction between venom-derived peptides and cancer-associated hub proteins: A computational study.毒液衍生肽与癌症相关枢纽蛋白之间相互作用的预测:一项计算研究。
Heliyon. 2023 Oct 26;9(11):e21149. doi: 10.1016/j.heliyon.2023.e21149. eCollection 2023 Nov.
7
DeepRaccess: high-speed RNA accessibility prediction using deep learning.DeepRaccess:使用深度学习进行高速RNA可及性预测
Front Bioinform. 2023 Oct 10;3:1275787. doi: 10.3389/fbinf.2023.1275787. eCollection 2023.
8
Alignment-free comparison of metagenomics sequences via approximate string matching.通过近似字符串匹配对宏基因组序列进行无比对比较。
Bioinform Adv. 2022 Oct 21;2(1):vbac077. doi: 10.1093/bioadv/vbac077. eCollection 2022.
9
The Buffalo OsteoPerio Studies: Summary of our findings and the unique contributions of Robert J. Genco, DDS, PhD.布法罗骨与牙周研究:我们的研究结果总结以及罗伯特·J·根科博士(牙医学博士、哲学博士)的独特贡献
Curr Oral Health Rep. 2020 Mar;7(1):29-36. doi: 10.1007/s40496-020-00257-3. Epub 2020 Jan 27.
10
AutoCoV: tracking the early spread of COVID-19 in terms of the spatial and temporal patterns from embedding space by K-mer based deep learning.AutoCoV:基于 K -mer 深度学习的嵌入空间追踪 COVID-19 时空模式的早期传播。
BMC Bioinformatics. 2022 Apr 25;23(Suppl 3):149. doi: 10.1186/s12859-022-04679-x.

本文引用的文献

1
A parallel computational framework for ultra-large-scale sequence clustering analysis.一种用于超大规模序列聚类分析的并行计算框架。
Bioinformatics. 2019 Feb 1;35(3):380-388. doi: 10.1093/bioinformatics/bty617.
2
Alignment-free sequence comparison: benefits, applications, and tools.无比对信息的序列比对:优势、应用和工具。
Genome Biol. 2017 Oct 3;18(1):186. doi: 10.1186/s13059-017-1319-7.
3
ESPRIT-Forest: Parallel clustering of massive amplicon sequence data in subquadratic time.ESPRIT-Forest:在亚二次时间内对海量扩增子序列数据进行并行聚类
PLoS Comput Biol. 2017 Apr 24;13(4):e1005518. doi: 10.1371/journal.pcbi.1005518. eCollection 2017 Apr.
4
Computational approach for deriving cancer progression roadmaps from static sample data.从静态样本数据推导癌症进展路线图的计算方法。
Nucleic Acids Res. 2017 May 19;45(9):e69. doi: 10.1093/nar/gkx003.
5
The microbiome of uncontacted Amerindians.未接触过外界的美洲印第安人的微生物群。
Sci Adv. 2015 Apr 3;1(3). doi: 10.1126/sciadv.1500183.
6
Deep learning.深度学习。
Nature. 2015 May 28;521(7553):436-44. doi: 10.1038/nature14539.
7
Kmacs: the k-mismatch average common substring approach to alignment-free sequence comparison.Kmacs:一种无比对的序列比对方法,通过 k-错配平均公共子串实现。
Bioinformatics. 2014 Jul 15;30(14):2000-8. doi: 10.1093/bioinformatics/btu331. Epub 2014 May 13.
8
Kraken: ultrafast metagenomic sequence classification using exact alignments.克拉肯:使用精确比对的超快速宏基因组序列分类
Genome Biol. 2014 Mar 3;15(3):R46. doi: 10.1186/gb-2014-15-3-r46.
9
New developments of alignment-free sequence comparison: measures, statistics and next-generation sequencing.无比对序列比较的新进展:度量、统计学与新一代测序
Brief Bioinform. 2014 May;15(3):343-53. doi: 10.1093/bib/bbt067. Epub 2013 Sep 23.
10
Alignment-free genetic sequence comparisons: a review of recent approaches by word analysis.基于字分析的无比对基因序列比较:最新方法综述
Brief Bioinform. 2014 Nov;15(6):890-905. doi: 10.1093/bib/bbt052. Epub 2013 Jul 31.