• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过对子序列进行聚类实现更快的序列同源性搜索。

Faster sequence homology searches by clustering subsequences.

作者信息

Suzuki Shuji, Kakuta Masanori, Ishida Takashi, Akiyama Yutaka

机构信息

Graduate School of Information Science and Engineering, Tokyo Institute of Technology and Education Academy of Computational Life Sciences (ACLS), Tokyo Institute of Technology, Tokyo 152-8550, Japan Graduate School of Information Science and Engineering, Tokyo Institute of Technology and Education Academy of Computational Life Sciences (ACLS), Tokyo Institute of Technology, Tokyo 152-8550, Japan.

Graduate School of Information Science and Engineering, Tokyo Institute of Technology and Education Academy of Computational Life Sciences (ACLS), Tokyo Institute of Technology, Tokyo 152-8550, Japan.

出版信息

Bioinformatics. 2015 Apr 15;31(8):1183-90. doi: 10.1093/bioinformatics/btu780. Epub 2014 Nov 27.

DOI:10.1093/bioinformatics/btu780
PMID:25432166
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4393512/
Abstract

MOTIVATION

Sequence homology searches are used in various fields. New sequencing technologies produce huge amounts of sequence data, which continuously increase the size of sequence databases. As a result, homology searches require large amounts of computational time, especially for metagenomic analysis.

RESULTS

We developed a fast homology search method based on database subsequence clustering, and implemented it as GHOSTZ. This method clusters similar subsequences from a database to perform an efficient seed search and ungapped extension by reducing alignment candidates based on triangle inequality. The database subsequence clustering technique achieved an ∼2-fold increase in speed without a large decrease in search sensitivity. When we measured with metagenomic data, GHOSTZ is ∼2.2-2.8 times faster than RAPSearch and is ∼185-261 times faster than BLASTX.

AVAILABILITY AND IMPLEMENTATION

The source code is freely available for download at http://www.bi.cs.titech.ac.jp/ghostz/

CONTACT

akiyama@cs.titech.ac.jp

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

序列同源性搜索在各个领域都有应用。新的测序技术产生了大量的序列数据,这不断增加了序列数据库的规模。因此,同源性搜索需要大量的计算时间,特别是对于宏基因组分析。

结果

我们开发了一种基于数据库子序列聚类的快速同源性搜索方法,并将其实现为GHOSTZ。该方法对数据库中的相似子序列进行聚类,通过基于三角不等式减少比对候选来执行高效的种子搜索和无间隙扩展。数据库子序列聚类技术在不大幅降低搜索灵敏度的情况下实现了约2倍的速度提升。当我们用宏基因组数据进行测量时,GHOSTZ比RAPSearch快约2.2 - 2.8倍,比BLASTX快约185 - 261倍。

可用性和实现方式

源代码可在http://www.bi.cs.titech.ac.jp/ghostz/免费下载。

联系方式

akiyama@cs.titech.ac.jp

补充信息

补充数据可在《生物信息学》在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8d33/4393512/3e0b3a29c1da/btu780f9p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8d33/4393512/11b933f77075/btu780f1p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8d33/4393512/fa688d818e59/btu780f2p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8d33/4393512/78d8c5183c4d/btu780f3p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8d33/4393512/45cf59b72643/btu780f4p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8d33/4393512/c47d1333409f/btu780f5p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8d33/4393512/214601e4926c/btu780f6p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8d33/4393512/628015e8a81e/btu780f7p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8d33/4393512/9cf1187f8b5d/btu780f8p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8d33/4393512/3e0b3a29c1da/btu780f9p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8d33/4393512/11b933f77075/btu780f1p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8d33/4393512/fa688d818e59/btu780f2p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8d33/4393512/78d8c5183c4d/btu780f3p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8d33/4393512/45cf59b72643/btu780f4p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8d33/4393512/c47d1333409f/btu780f5p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8d33/4393512/214601e4926c/btu780f6p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8d33/4393512/628015e8a81e/btu780f7p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8d33/4393512/9cf1187f8b5d/btu780f8p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8d33/4393512/3e0b3a29c1da/btu780f9p.jpg

相似文献

1
Faster sequence homology searches by clustering subsequences.通过对子序列进行聚类实现更快的序列同源性搜索。
Bioinformatics. 2015 Apr 15;31(8):1183-90. doi: 10.1093/bioinformatics/btu780. Epub 2014 Nov 27.
2
GPU-Acceleration of Sequence Homology Searches with Database Subsequence Clustering.利用数据库子序列聚类实现序列同源性搜索的GPU加速
PLoS One. 2016 Aug 2;11(8):e0157338. doi: 10.1371/journal.pone.0157338. eCollection 2016.
3
MMseqs software suite for fast and deep clustering and searching of large protein sequence sets.MMseqs软件套件,用于对大型蛋白质序列集进行快速且深入的聚类和搜索。
Bioinformatics. 2016 May 1;32(9):1323-30. doi: 10.1093/bioinformatics/btw006. Epub 2016 Jan 6.
4
GHOSTX: an improved sequence homology search algorithm using a query suffix array and a database suffix array.GHOSTX:一种使用查询后缀数组和数据库后缀数组改进的序列同源搜索算法。
PLoS One. 2014 Aug 6;9(8):e103833. doi: 10.1371/journal.pone.0103833. eCollection 2014.
5
GHOSTX: A Fast Sequence Homology Search Tool for Functional Annotation of Metagenomic Data.GHOSTX:一种用于宏基因组数据功能注释的快速序列同源性搜索工具。
Methods Mol Biol. 2017;1611:15-25. doi: 10.1007/978-1-4939-7015-5_2.
6
COGNIZER: A Framework for Functional Annotation of Metagenomic Datasets.认知器:宏基因组数据集功能注释框架
PLoS One. 2015 Nov 11;10(11):e0142102. doi: 10.1371/journal.pone.0142102. eCollection 2015.
7
GRASP2: fast and memory-efficient gene-centric assembly and homolog search for metagenomic sequencing data.GRASP2:用于宏基因组测序数据的快速、高效、基于基因的组装和同源搜索。
BMC Bioinformatics. 2019 Jun 6;20(Suppl 11):276. doi: 10.1186/s12859-019-2818-1.
8
A poor man's BLASTX--high-throughput metagenomic protein database search using PAUDA.穷人的 BLASTX——使用 PAUDA 进行高通量宏基因组蛋白质数据库搜索。
Bioinformatics. 2014 Jan 1;30(1):38-9. doi: 10.1093/bioinformatics/btt254. Epub 2013 May 7.
9
A Massively Parallel Sequence Similarity Search for Metagenomic Sequencing Data.宏基因组测序数据的大规模并行序列相似性搜索。
Int J Mol Sci. 2017 Oct 11;18(10):2124. doi: 10.3390/ijms18102124.
10
RAPSearch: a fast protein similarity search tool for short reads.RAPSearch:一种用于短读长的快速蛋白质相似性搜索工具。
BMC Bioinformatics. 2011 May 15;12:159. doi: 10.1186/1471-2105-12-159.

引用本文的文献

1
Transcriptome Profile Analyses of Head Kidney in Roach (), Common Bream () and Their Hybrids: Does Infection by Monogenean Parasites in Freshwater Fish Reveal Differences in Fish Vigour among Parental Species and Their Hybrids?拟鲤、欧鳊及其杂交种头部肾脏的转录组图谱分析:淡水鱼单殖吸虫感染是否揭示亲本物种及其杂交种之间的鱼活力差异?
Biology (Basel). 2023 Sep 1;12(9):1199. doi: 10.3390/biology12091199.
2
Temporal transcriptome of tomato elucidates the signaling pathways of induced systemic resistance and systemic acquired resistance activated by .番茄的时间转录组揭示了由……激活的诱导系统抗性和系统获得性抗性的信号通路。
Front Genet. 2022 Nov 18;13:1048578. doi: 10.3389/fgene.2022.1048578. eCollection 2022.
3

本文引用的文献

1
Compressive genomics for protein databases.基于压缩的基因组学蛋白质数据库。
Bioinformatics. 2013 Jul 1;29(13):i283-90. doi: 10.1093/bioinformatics/btt214.
2
CD-HIT: accelerated for clustering the next-generation sequencing data.CD-HIT:用于加速下一代测序数据聚类的工具。
Bioinformatics. 2012 Dec 1;28(23):3150-2. doi: 10.1093/bioinformatics/bts565. Epub 2012 Oct 11.
3
Structure, function and diversity of the healthy human microbiome.健康人体微生物组的结构、功能与多样性。
Transcriptome Reprogramming of Tomato Orchestrate the Hormone Signaling Network of Systemic Resistance Induced by .
番茄转录组重编程调控由……诱导的系统抗性激素信号网络 。(原文中“by”后面内容缺失)
Front Plant Sci. 2021 Sep 23;12:721193. doi: 10.3389/fpls.2021.721193. eCollection 2021.
4
Improved Large-Scale Homology Search by Two-Step Seed Search Using Multiple Reduced Amino Acid Alphabets.两步种子搜索结合多套简化氨基酸字母表提高大规模同源性搜索
Genes (Basel). 2021 Sep 21;12(9):1455. doi: 10.3390/genes12091455.
5
Analysis of 56,348 Genomes Identifies the Relationship between Antibiotic and Metal Resistance and the Spread of Multidrug-Resistant Non-Typhoidal Salmonella.对56348个基因组的分析确定了抗生素耐药性与金属抗性之间的关系以及多重耐药非伤寒沙门氏菌的传播情况。
Microorganisms. 2021 Jul 9;9(7):1468. doi: 10.3390/microorganisms9071468.
6
Transcriptome Profiling Provides Insights Into Potential Antagonistic Mechanisms Involved in Against .转录组分析为深入了解针对……所涉及的潜在拮抗机制提供了见解。 (你提供的原文中“Against.”后面缺少具体内容,导致翻译不够完整准确。)
Front Microbiol. 2020 Dec 7;11:578115. doi: 10.3389/fmicb.2020.578115. eCollection 2020.
7
Comparative genomic analyses illuminate the distinct evolution of megabats within Chiroptera.比较基因组分析揭示了翼手目内大型蝙蝠的独特进化。
DNA Res. 2020 Aug 1;27(4). doi: 10.1093/dnares/dsaa021.
8
Characterisation of the L. Phyllomicrobiome in Urban and Forest Areas.城市和森林地区L. Phyllomicrobiome的特征分析
Front Microbiol. 2019 May 29;10:1110. doi: 10.3389/fmicb.2019.01110. eCollection 2019.
9
Metaepigenomic analysis reveals the unexplored diversity of DNA methylation in an environmental prokaryotic community.元表观基因组分析揭示了环境原核生物群落中未被探索的 DNA 甲基化多样性。
Nat Commun. 2019 Jan 11;10(1):159. doi: 10.1038/s41467-018-08103-y.
10
A Novel Eukaryotic Denitrification Pathway in Foraminifera.有孔虫中新型真核生物反硝化途径。
Curr Biol. 2018 Aug 20;28(16):2536-2543.e5. doi: 10.1016/j.cub.2018.06.027. Epub 2018 Aug 2.
Nature. 2012 Jun 13;486(7402):207-14. doi: 10.1038/nature11234.
4
Fast gapped-read alignment with Bowtie 2.快速缺口读对准与 Bowtie 2。
Nat Methods. 2012 Mar 4;9(4):357-9. doi: 10.1038/nmeth.1923.
5
RAPSearch2: a fast and memory-efficient protein similarity search tool for next-generation sequencing data.RAPSearch2:一种快速且内存高效的用于下一代测序数据的蛋白质相似性搜索工具。
Bioinformatics. 2012 Jan 1;28(1):125-6. doi: 10.1093/bioinformatics/btr595. Epub 2011 Oct 28.
6
RAPSearch: a fast protein similarity search tool for short reads.RAPSearch:一种用于短读长的快速蛋白质相似性搜索工具。
BMC Bioinformatics. 2011 May 15;12:159. doi: 10.1186/1471-2105-12-159.
7
Meeting report: the terabase metagenomics workshop and the vision of an Earth microbiome project.会议报告:万亿碱基宏基因组学研讨会与地球微生物组计划愿景
Stand Genomic Sci. 2010 Dec 25;3(3):243-8. doi: 10.4056/sigs.1433550.
8
GPU-BLAST: using graphics processors to accelerate protein sequence alignment.GPU-BLAST:利用图形处理器加速蛋白质序列比对。
Bioinformatics. 2011 Jan 15;27(2):182-8. doi: 10.1093/bioinformatics/btq644. Epub 2010 Nov 18.
9
A human gut microbial gene catalogue established by metagenomic sequencing.宏基因组测序建立的人类肠道微生物基因目录。
Nature. 2010 Mar 4;464(7285):59-65. doi: 10.1038/nature08821.
10
Fast and accurate long-read alignment with Burrows-Wheeler transform.基于 Burrows-Wheeler 变换的快速准确长读比对。
Bioinformatics. 2010 Mar 1;26(5):589-95. doi: 10.1093/bioinformatics/btp698. Epub 2010 Jan 15.