• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

宏基因组测序数据的大规模并行序列相似性搜索。

A Massively Parallel Sequence Similarity Search for Metagenomic Sequencing Data.

机构信息

Department of Computer Science, Graduate School of Information Science and Engineering, Tokyo Institute of Technology, 2-12-1 W8-76 Ookayama, Meguro-ku, Tokyo 152-8550, Japan.

Education Academy of Computational Life Sciences (ACLS), Tokyo Institute of Technology, 4259 J3-141 Nagatsuta-cho, Midori-ku, Yokohama, Kanagawa 226-8503, Japan.

出版信息

Int J Mol Sci. 2017 Oct 11;18(10):2124. doi: 10.3390/ijms18102124.

DOI:10.3390/ijms18102124
PMID:29019934
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5666806/
Abstract

Sequence similarity searches have been widely used in the analyses of metagenomic sequencing data. Finding homologous sequences in a reference database enables the estimation of taxonomic and functional characteristics of each query sequence. Because current metagenomic sequencing data consist of a large number of nucleotide sequences, the time required for sequence similarity searches account for a large proportion of the total time. This time-consuming step makes it difficult to perform large-scale analyses. To analyze large-scale metagenomic data, such as those found in the human oral microbiome, we developed GHOST-MP (Genome-wide HOmology Search Tool on Massively Parallel system), a parallel sequence similarity search tool for massively parallel computing systems. This tool uses a fast search algorithm based on suffix arrays of query and database sequences and a hierarchical parallel search to accelerate the large-scale sequence similarity search of metagenomic sequencing data. The parallel computing efficiency and the search speed of this tool were evaluated. GHOST-MP was shown to be scalable over 10,000 CPU (Central Processing Unit) cores, and achieved over 80-fold acceleration compared with mpiBLAST using the same computational resources. We applied this tool to human oral metagenomic data, and the results indicate that the oral cavity, the oral vestibule, and plaque have different characteristics based on the functional gene category.

摘要

序列相似性搜索在宏基因组测序数据分析中得到了广泛应用。在参考数据库中查找同源序列,可以估计每个查询序列的分类和功能特征。由于当前的宏基因组测序数据包含大量的核苷酸序列,因此序列相似性搜索所需的时间占据了总时间的很大比例。这一耗时的步骤使得大规模分析变得困难。为了分析大规模的宏基因组数据,例如人类口腔微生物组中的数据,我们开发了 GHOST-MP(大规模并行系统上的全基因组同源搜索工具),这是一种用于大规模并行计算系统的并行序列相似性搜索工具。该工具使用基于查询和数据库序列后缀数组的快速搜索算法和分层并行搜索来加速宏基因组测序数据的大规模序列相似性搜索。评估了该工具的并行计算效率和搜索速度。结果表明,GHOST-MP 在超过 10000 个 CPU(中央处理器)核心上具有可扩展性,并且在使用相同计算资源时,与 mpiBLAST 相比,实现了超过 80 倍的加速。我们将该工具应用于人类口腔宏基因组数据,结果表明,口腔、口腔前庭和牙菌斑在功能基因类别上具有不同的特征。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1100/5666806/876f6a692ad7/ijms-18-02124-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1100/5666806/a718b3b9fa3c/ijms-18-02124-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1100/5666806/3dc7c5bef4da/ijms-18-02124-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1100/5666806/f720345083f3/ijms-18-02124-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1100/5666806/b9c2a7ea3a06/ijms-18-02124-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1100/5666806/addb95c3a587/ijms-18-02124-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1100/5666806/876f6a692ad7/ijms-18-02124-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1100/5666806/a718b3b9fa3c/ijms-18-02124-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1100/5666806/3dc7c5bef4da/ijms-18-02124-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1100/5666806/f720345083f3/ijms-18-02124-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1100/5666806/b9c2a7ea3a06/ijms-18-02124-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1100/5666806/addb95c3a587/ijms-18-02124-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1100/5666806/876f6a692ad7/ijms-18-02124-g006.jpg

相似文献

1
A Massively Parallel Sequence Similarity Search for Metagenomic Sequencing Data.宏基因组测序数据的大规模并行序列相似性搜索。
Int J Mol Sci. 2017 Oct 11;18(10):2124. doi: 10.3390/ijms18102124.
2
Scalable metagenomics alignment research tool (SMART): a scalable, rapid, and complete search heuristic for the classification of metagenomic sequences from complex sequence populations.可扩展宏基因组比对研究工具(SMART):一种用于对复杂序列群体中的宏基因组序列进行分类的可扩展、快速且完整的搜索启发式方法。
BMC Bioinformatics. 2016 Jul 28;17:292. doi: 10.1186/s12859-016-1159-6.
3
ViraPipe: scalable parallel pipeline for viral metagenome analysis from next generation sequencing reads.ViraPipe:用于从下一代测序读取中进行病毒宏基因组分析的可扩展并行管道。
Bioinformatics. 2018 Mar 15;34(6):928-935. doi: 10.1093/bioinformatics/btx702.
4
GHOSTX: A Fast Sequence Homology Search Tool for Functional Annotation of Metagenomic Data.GHOSTX:一种用于宏基因组数据功能注释的快速序列同源性搜索工具。
Methods Mol Biol. 2017;1611:15-25. doi: 10.1007/978-1-4939-7015-5_2.
5
GRASP2: fast and memory-efficient gene-centric assembly and homolog search for metagenomic sequencing data.GRASP2:用于宏基因组测序数据的快速、高效、基于基因的组装和同源搜索。
BMC Bioinformatics. 2019 Jun 6;20(Suppl 11):276. doi: 10.1186/s12859-019-2818-1.
6
COGNIZER: A Framework for Functional Annotation of Metagenomic Datasets.认知器:宏基因组数据集功能注释框架
PLoS One. 2015 Nov 11;10(11):e0142102. doi: 10.1371/journal.pone.0142102. eCollection 2015.
7
CLAST: CUDA implemented large-scale alignment search tool.CLAST:基于CUDA实现的大规模比对搜索工具。
BMC Bioinformatics. 2014 Dec 11;15(1):406. doi: 10.1186/s12859-014-0406-y.
8
Selection of marker genes for genetic barcoding of microorganisms and binning of metagenomic reads by Barcoder software tools.微生物遗传条形码标记基因的选择和 Barcoder 软件工具对宏基因组读段的分类。
BMC Bioinformatics. 2018 Aug 30;19(1):309. doi: 10.1186/s12859-018-2320-1.
9
MPI-blastn and NCBI-TaxCollector: improving metagenomic analysis with high performance classification and wide taxonomic attachment.MPI-blastn和NCBI-TaxCollector:通过高性能分类和广泛的分类归属改进宏基因组分析。
J Bioinform Comput Biol. 2014 Jun;12(3):1450013. doi: 10.1142/S0219720014500139.
10
GPU-Acceleration of Sequence Homology Searches with Database Subsequence Clustering.利用数据库子序列聚类实现序列同源性搜索的GPU加速
PLoS One. 2016 Aug 2;11(8):e0157338. doi: 10.1371/journal.pone.0157338. eCollection 2016.

引用本文的文献

1
An enterococcal phage-derived enzyme suppresses graft-versus-host disease.一种源自肠球菌噬菌体的酶可抑制移植物抗宿主病。
Nature. 2024 Aug;632(8023):174-181. doi: 10.1038/s41586-024-07667-8. Epub 2024 Jul 10.
2
Metagenomic and proteomic analysis of bacterial retting community and proteome profile in the degumming process of kenaf bast.麻纤维脱胶过程中细菌发酵群落的宏基因组和蛋白质组分析及蛋白质组图谱
BMC Plant Biol. 2022 Nov 5;22(1):516. doi: 10.1186/s12870-022-03890-5.

本文引用的文献

1
Functional signatures of oral dysbiosis during periodontitis progression revealed by microbial metatranscriptome analysis.微生物元转录组分析揭示牙周炎进展过程中口腔微生物群落失调的功能特征
Genome Med. 2015 Apr 27;7(1):27. doi: 10.1186/s13073-015-0153-3. eCollection 2015.
2
GHOSTX: an improved sequence homology search algorithm using a query suffix array and a database suffix array.GHOSTX:一种使用查询后缀数组和数据库后缀数组改进的序列同源搜索算法。
PLoS One. 2014 Aug 6;9(8):e103833. doi: 10.1371/journal.pone.0103833. eCollection 2014.
3
Community-wide transcriptome of the oral microbiome in subjects with and without periodontitis.
患有和未患有牙周炎的受试者口腔微生物群的全社区转录组
ISME J. 2014 Aug;8(8):1659-72. doi: 10.1038/ismej.2014.23. Epub 2014 Mar 6.
4
Metagenomic sequencing reveals microbiota and its functional potential associated with periodontal disease.宏基因组测序揭示了与牙周病相关的微生物群落及其功能潜力。
Sci Rep. 2013;3:1843. doi: 10.1038/srep01843.
5
Metabolic reconstruction for metagenomic data and its application to the human microbiome.宏基因组数据的代谢重建及其在人类微生物组中的应用。
PLoS Comput Biol. 2012;8(6):e1002358. doi: 10.1371/journal.pcbi.1002358. Epub 2012 Jun 13.
6
Structure, function and diversity of the healthy human microbiome.健康人体微生物组的结构、功能与多样性。
Nature. 2012 Jun 13;486(7402):207-14. doi: 10.1038/nature11234.
7
Composition of the adult digestive tract bacterial microbiome based on seven mouth surfaces, tonsils, throat and stool samples.基于 7 个口腔表面、扁桃体、咽喉和粪便样本的成人消化道细菌微生物组的组成。
Genome Biol. 2012 Jun 14;13(6):R42. doi: 10.1186/gb-2012-13-6-r42.
8
RAPSearch2: a fast and memory-efficient protein similarity search tool for next-generation sequencing data.RAPSearch2:一种快速且内存高效的用于下一代测序数据的蛋白质相似性搜索工具。
Bioinformatics. 2012 Jan 1;28(1):125-6. doi: 10.1093/bioinformatics/btr595. Epub 2011 Oct 28.
9
Universally distributed single-copy genes indicate a constant rate of horizontal transfer.普遍分布的单拷贝基因表明水平转移的速率是恒定的。
PLoS One. 2011;6(8):e22099. doi: 10.1371/journal.pone.0022099. Epub 2011 Aug 5.
10
The human oral microbiome.人类口腔微生物组。
J Bacteriol. 2010 Oct;192(19):5002-17. doi: 10.1128/JB.00542-10. Epub 2010 Jul 23.