• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过近似字符串匹配对宏基因组序列进行无比对比较。

Alignment-free comparison of metagenomics sequences via approximate string matching.

作者信息

Chen Jian, Yang Le, Li Lu, Goodison Steve, Sun Yijun

机构信息

Department of Computer Science and Engineering, University at Buffalo, Buffalo, NY 14260, USA.

Department of Microbiology and Immunology, University at Buffalo, Buffalo, NY 14203, USA.

出版信息

Bioinform Adv. 2022 Oct 21;2(1):vbac077. doi: 10.1093/bioadv/vbac077. eCollection 2022.

DOI:10.1093/bioadv/vbac077
PMID:36388153
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9645238/
Abstract

SUMMARY

Quantifying pairwise sequence similarities is a key step in metagenomics studies. Alignment-free methods provide a computationally efficient alternative to alignment-based methods for large-scale sequence analysis. Several neural network-based methods have recently been developed for this purpose. However, existing methods do not perform well on sequences of varying lengths and are sensitive to the presence of insertions and deletions. In this article, we describe the development of a new method, referred to as AsMac that addresses the aforementioned issues. We proposed a novel neural network structure for approximate string matching for the extraction of pertinent information from biological sequences and developed an efficient gradient computation algorithm for training the constructed neural network. We performed a large-scale benchmark study using real-world data that demonstrated the effectiveness and potential utility of the proposed method.

AVAILABILITY AND IMPLEMENTATION

The open-source software for the proposed method and trained neural-network models for some commonly used metagenomics marker genes were developed and are freely available at www.acsu.buffalo.edu/~yijunsun/lab/AsMac.html.

SUPPLEMENTARY INFORMATION

Supplementary data are available at online.

摘要

摘要

量化成对序列相似性是宏基因组学研究中的关键步骤。对于大规模序列分析,无比对方法为基于比对的方法提供了一种计算效率更高的替代方案。最近已经为此目的开发了几种基于神经网络的方法。然而,现有方法在长度不同的序列上表现不佳,并且对插入和缺失的存在很敏感。在本文中,我们描述了一种新方法AsMac的开发,该方法解决了上述问题。我们提出了一种新颖的神经网络结构用于近似字符串匹配,以从生物序列中提取相关信息,并开发了一种有效的梯度计算算法来训练构建的神经网络。我们使用真实世界数据进行了大规模基准研究,证明了所提出方法的有效性和潜在实用性。

可用性和实现

已开发出所提出方法的开源软件以及针对一些常用宏基因组学标记基因的训练好的神经网络模型,可在www.acsu.buffalo.edu/~yijunsun/lab/AsMac.html上免费获取。

补充信息

补充数据可在网上获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/139c/9710603/442b10668462/vbac077f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/139c/9710603/39022742be09/vbac077f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/139c/9710603/d70efc91584b/vbac077f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/139c/9710603/1e75877e194d/vbac077f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/139c/9710603/49ff42f05aa2/vbac077f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/139c/9710603/442b10668462/vbac077f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/139c/9710603/39022742be09/vbac077f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/139c/9710603/d70efc91584b/vbac077f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/139c/9710603/1e75877e194d/vbac077f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/139c/9710603/49ff42f05aa2/vbac077f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/139c/9710603/442b10668462/vbac077f5.jpg

相似文献

1
Alignment-free comparison of metagenomics sequences via approximate string matching.通过近似字符串匹配对宏基因组序列进行无比对比较。
Bioinform Adv. 2022 Oct 21;2(1):vbac077. doi: 10.1093/bioadv/vbac077. eCollection 2022.
2
SENSE: Siamese neural network for sequence embedding and alignment-free comparison.感知:用于序列嵌入和无比对的暹罗神经网络。
Bioinformatics. 2019 Jun 1;35(11):1820-1828. doi: 10.1093/bioinformatics/bty887.
3
A parallel computational framework for ultra-large-scale sequence clustering analysis.一种用于超大规模序列聚类分析的并行计算框架。
Bioinformatics. 2019 Feb 1;35(3):380-388. doi: 10.1093/bioinformatics/bty617.
4
Deep-learning approach to identifying cancer subtypes using high-dimensional genomic data.深度学习方法通过高维基因组数据识别癌症亚型。
Bioinformatics. 2020 Mar 1;36(5):1476-1483. doi: 10.1093/bioinformatics/btz769.
5
libFLASM: a software library for fixed-length approximate string matching.libFLASM:一个用于固定长度近似字符串匹配的软件库。
BMC Bioinformatics. 2016 Nov 10;17(1):454. doi: 10.1186/s12859-016-1320-2.
6
ESPRIT-Forest: Parallel clustering of massive amplicon sequence data in subquadratic time.ESPRIT-Forest:在亚二次时间内对海量扩增子序列数据进行并行聚类
PLoS Comput Biol. 2017 Apr 24;13(4):e1005518. doi: 10.1371/journal.pcbi.1005518. eCollection 2017 Apr.
7
Fuse: multiple network alignment via data fusion.Fuse:通过数据融合进行多重网络比对。
Bioinformatics. 2016 Apr 15;32(8):1195-203. doi: 10.1093/bioinformatics/btv731. Epub 2015 Dec 14.
8
Integrating alignment-based and alignment-free sequence similarity measures for biological sequence classification.整合基于比对和非比对的序列相似性度量用于生物序列分类。
Bioinformatics. 2015 May 1;31(9):1396-404. doi: 10.1093/bioinformatics/btv006. Epub 2015 Jan 7.
9
UProC: tools for ultra-fast protein domain classification.UProC:超快速蛋白质结构域分类工具
Bioinformatics. 2015 May 1;31(9):1382-8. doi: 10.1093/bioinformatics/btu843. Epub 2014 Dec 23.
10
ALFRED: A Practical Method for Alignment-Free Distance Computation.阿尔弗雷德:一种无比对距离计算的实用方法。
J Comput Biol. 2016 Jun;23(6):452-60. doi: 10.1089/cmb.2015.0217. Epub 2016 May 3.

引用本文的文献

1
DeepRaccess: high-speed RNA accessibility prediction using deep learning.DeepRaccess:使用深度学习进行高速RNA可及性预测
Front Bioinform. 2023 Oct 10;3:1275787. doi: 10.3389/fbinf.2023.1275787. eCollection 2023.

本文引用的文献

1
Benchmarking of alignment-free sequence comparison methods.无比对信息的序列比较方法的基准测试。
Genome Biol. 2019 Jul 25;20(1):144. doi: 10.1186/s13059-019-1755-7.
2
The Subgingival Microbiome Relationship to Periodontal Disease in Older Women.龈下微生物组与老年女性牙周病的关系。
J Dent Res. 2019 Aug;98(9):975-984. doi: 10.1177/0022034519860449.
3
High-throughput amplicon sequencing of the full-length 16S rRNA gene with single-nucleotide resolution.高通量扩增子测序全长度 16S rRNA 基因,具有单核苷酸分辨率。
Nucleic Acids Res. 2019 Oct 10;47(18):e103. doi: 10.1093/nar/gkz569.
4
SENSE: Siamese neural network for sequence embedding and alignment-free comparison.感知:用于序列嵌入和无比对的暹罗神经网络。
Bioinformatics. 2019 Jun 1;35(11):1820-1828. doi: 10.1093/bioinformatics/bty887.
5
A parallel computational framework for ultra-large-scale sequence clustering analysis.一种用于超大规模序列聚类分析的并行计算框架。
Bioinformatics. 2019 Feb 1;35(3):380-388. doi: 10.1093/bioinformatics/bty617.
6
Alignment-free sequence comparison: benefits, applications, and tools.无比对信息的序列比对:优势、应用和工具。
Genome Biol. 2017 Oct 3;18(1):186. doi: 10.1186/s13059-017-1319-7.
7
ESPRIT-Forest: Parallel clustering of massive amplicon sequence data in subquadratic time.ESPRIT-Forest:在亚二次时间内对海量扩增子序列数据进行并行聚类
PLoS Comput Biol. 2017 Apr 24;13(4):e1005518. doi: 10.1371/journal.pcbi.1005518. eCollection 2017 Apr.
8
Microbiome-wide association studies link dynamic microbial consortia to disease.微生物组关联研究将动态微生物群落与疾病联系起来。
Nature. 2016 Jul 7;535(7610):94-103. doi: 10.1038/nature18850.
9
DADA2: High-resolution sample inference from Illumina amplicon data.DADA2:从Illumina扩增子数据进行高分辨率样本推断。
Nat Methods. 2016 Jul;13(7):581-3. doi: 10.1038/nmeth.3869. Epub 2016 May 23.
10
The microbiome of uncontacted Amerindians.未接触过外界的美洲印第安人的微生物群。
Sci Adv. 2015 Apr 3;1(3). doi: 10.1126/sciadv.1500183.