• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

MCtandem:一种在许多集成核心 (MIC) 架构上进行大规模肽鉴定的高效工具。

MCtandem: an efficient tool for large-scale peptide identification on many integrated core (MIC) architecture.

机构信息

College of Computer Science and Electronic Engineering, Hunan University, Lushannan Road, Changsha, 410082, China.

School of Computer Science and Engineering, Nanyang Technological University, Nangyang Road, Singapore, 639798, Singapore.

出版信息

BMC Bioinformatics. 2019 Jul 17;20(1):397. doi: 10.1186/s12859-019-2980-5.

DOI:10.1186/s12859-019-2980-5
PMID:31315562
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6637555/
Abstract

BACKGROUND

Tandem mass spectrometry (MS/MS)-based database searching is a widely acknowledged and widely used method for peptide identification in shotgun proteomics. However, due to the rapid growth of spectra data produced by advanced mass spectrometry and the greatly increased number of modified and digested peptides identified in recent years, the current methods for peptide database searching cannot rapidly and thoroughly process large MS/MS spectra datasets. A breakthrough in efficient database search algorithms is crucial for peptide identification in computational proteomics.

RESULTS

This paper presents MCtandem, an efficient tool for large-scale peptide identification on Intel Many Integrated Core (MIC) architecture. To support big data processing capability, a novel parallel match scoring algorithm, named MIC-SDP (spectrum dot product), and its two-level parallelization are presented in MCtandem's design. In addition, a series of optimization strategies on both the host CPU side and the MIC side, which includes pre-fetching, optimized communication overlapping scheme, multithreading and hyper-threading, are exploited to improve the execution performance.

CONCLUSIONS

For fair comparisons, we first set up experiments and verified the 28 fold times speedup on a single MIC against the original CPU-based implementation. We then execute the MCtandem for a very large dataset on an MIC cluster (a component of the Tianhe-2 supercomputer) and achieved much higher scalability than in a benchmark MapReduce-based programs, MR-Tandem. MCtandem is an open-source software tool implemented in C++. The source code and the parameter settings are available at https://github.com/LogicZY/MCtandem .

摘要

背景

基于串联质谱(MS/MS)的数据库搜索是一种广泛认可和广泛使用的方法,用于在鸟枪法蛋白质组学中鉴定肽。然而,由于先进质谱产生的光谱数据的快速增长以及近年来鉴定的修饰和消化肽的数量大大增加,目前的肽数据库搜索方法无法快速而彻底地处理大型 MS/MS 光谱数据集。高效数据库搜索算法的突破对于计算蛋白质组学中的肽鉴定至关重要。

结果

本文提出了 MCtandem,这是一种在英特尔多核(MIC)架构上进行大规模肽鉴定的有效工具。为了支持大数据处理能力,在 MCtandem 的设计中提出了一种新的并行匹配评分算法,称为 MIC-SDP(光谱点积)及其两级并行化。此外,还在主机 CPU 端和 MIC 端上利用了一系列优化策略,包括预取、优化的通信重叠方案、多线程和超线程,以提高执行性能。

结论

为了进行公平比较,我们首先在单个 MIC 上针对原始基于 CPU 的实现进行了实验并验证了 28 倍的速度提升。然后,我们在 MIC 集群(天河-2 超级计算机的一个组件)上对 MCtandem 进行了非常大的数据集的执行,并实现了比基准 MapReduce 程序 MR-Tandem 更高的可扩展性。MCtandem 是一个用 C++实现的开源软件工具。源代码和参数设置可在 https://github.com/LogicZY/MCtandem 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/903f/6637555/242f5da8ddd0/12859_2019_2980_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/903f/6637555/58a3d6b60d41/12859_2019_2980_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/903f/6637555/353aa8c4f9a5/12859_2019_2980_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/903f/6637555/94a4e54cd7b5/12859_2019_2980_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/903f/6637555/244f45c8e037/12859_2019_2980_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/903f/6637555/873a687f37cd/12859_2019_2980_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/903f/6637555/3aec2498e1c7/12859_2019_2980_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/903f/6637555/061326a2d639/12859_2019_2980_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/903f/6637555/242f5da8ddd0/12859_2019_2980_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/903f/6637555/58a3d6b60d41/12859_2019_2980_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/903f/6637555/353aa8c4f9a5/12859_2019_2980_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/903f/6637555/94a4e54cd7b5/12859_2019_2980_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/903f/6637555/244f45c8e037/12859_2019_2980_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/903f/6637555/873a687f37cd/12859_2019_2980_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/903f/6637555/3aec2498e1c7/12859_2019_2980_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/903f/6637555/061326a2d639/12859_2019_2980_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/903f/6637555/242f5da8ddd0/12859_2019_2980_Fig8_HTML.jpg

相似文献

1
MCtandem: an efficient tool for large-scale peptide identification on many integrated core (MIC) architecture.MCtandem:一种在许多集成核心 (MIC) 架构上进行大规模肽鉴定的高效工具。
BMC Bioinformatics. 2019 Jul 17;20(1):397. doi: 10.1186/s12859-019-2980-5.
2
SW-Tandem: a highly efficient tool for large-scale peptide identification with parallel spectrum dot product on Sunway TaihuLight.SW-Tandem:在神威·太湖之光上通过并行谱点积进行大规模肽段鉴定的高效工具。
Bioinformatics. 2019 Oct 1;35(19):3861-3863. doi: 10.1093/bioinformatics/btz147.
3
Accelerating the scoring module of mass spectrometry-based peptide identification using GPUs.利用 GPU 加速基于质谱的肽鉴定的打分模块。
BMC Bioinformatics. 2014 Apr 28;15:121. doi: 10.1186/1471-2105-15-121.
4
In-depth analysis of protein inference algorithms using multiple search engines and well-defined metrics.使用多个搜索引擎和明确的指标对蛋白质推断算法进行深入分析。
J Proteomics. 2017 Jan 6;150:170-182. doi: 10.1016/j.jprot.2016.08.002. Epub 2016 Aug 4.
5
SWPepNovo: An Efficient De Novo Peptide Sequencing Tool for Large-scale MS/MS Spectra Analysis.SWPepNovo:一种用于大规模 MS/MS 谱分析的高效从头肽测序工具。
Int J Biol Sci. 2019 Jul 3;15(9):1787-1801. doi: 10.7150/ijbs.32142. eCollection 2019.
6
Efficient computation of motif discovery on Intel Many Integrated Core (MIC) Architecture.在英特尔众核(MIC)架构上高效计算基序发现。
BMC Bioinformatics. 2018 Aug 13;19(Suppl 9):282. doi: 10.1186/s12859-018-2276-1.
7
VEMS 3.0: algorithms and computational tools for tandem mass spectrometry based identification of post-translational modifications in proteins.VEMS 3.0:用于基于串联质谱法鉴定蛋白质翻译后修饰的算法和计算工具
J Proteome Res. 2005 Nov-Dec;4(6):2338-47. doi: 10.1021/pr050264q.
8
MUMAL2: Improving sensitivity in shotgun proteomics using cost sensitive artificial neural networks and a threshold selector algorithm.MUMAL2:使用成本敏感型人工神经网络和阈值选择算法提高鸟枪法蛋白质组学的灵敏度
BMC Bioinformatics. 2016 Dec 15;17(Suppl 18):472. doi: 10.1186/s12859-016-1341-x.
9
pClean: An Algorithm To Preprocess High-Resolution Tandem Mass Spectra for Database Searching.pClean:一种用于为数据库搜索预处理高分辨率串联质谱的算法。
J Proteome Res. 2019 Sep 6;18(9):3235-3244. doi: 10.1021/acs.jproteome.9b00141. Epub 2019 Aug 14.
10
Protein Identification from Tandem Mass Spectra by Database Searching.通过数据库搜索从串联质谱中鉴定蛋白质。
Methods Mol Biol. 2017;1558:357-380. doi: 10.1007/978-1-4939-6783-4_17.

引用本文的文献

1
Locality-sensitive hashing enables efficient and scalable signal classification in high-throughput mass spectrometry raw data.基于位置敏感哈希的方法能够高效、大规模地对高通量质谱原始数据中的信号进行分类。
BMC Bioinformatics. 2022 Jul 20;23(1):287. doi: 10.1186/s12859-022-04833-5.
2
Communication Lower-Bounds for Distributed-Memory Computations for Mass Spectrometry based Omics Data.基于质谱的组学数据的分布式内存计算的通信下限
J Parallel Distrib Comput. 2022 Mar;161:37-47. doi: 10.1016/j.jpdc.2021.11.001. Epub 2021 Nov 17.
3
High Performance Computing Framework for Tera-Scale Database Search of Mass Spectrometry Data.

本文引用的文献

1
Efficient computation of motif discovery on Intel Many Integrated Core (MIC) Architecture.在英特尔众核(MIC)架构上高效计算基序发现。
BMC Bioinformatics. 2018 Aug 13;19(Suppl 9):282. doi: 10.1186/s12859-018-2276-1.
2
Managing Complex Workflows in Bioinformatics: An Interactive Toolkit With GPU Acceleration.生物信息学中复杂工作流程的管理:具有 GPU 加速的交互式工具包。
IEEE Trans Nanobioscience. 2018 Jul;17(3):199-208. doi: 10.1109/TNB.2018.2837122. Epub 2018 May 16.
3
LiverWiki: a wiki-based database for human liver.肝脏维基:一个基于维基的人类肝脏数据库。
用于质谱数据太字节规模数据库搜索的高性能计算框架
Nat Comput Sci. 2021 Aug;1(8):550-561. doi: 10.1038/s43588-021-00113-z. Epub 2021 Aug 20.
4
Methods for Proteogenomics Data Analysis, Challenges, and Scalability Bottlenecks: A Survey.蛋白质基因组学数据分析方法、挑战及可扩展性瓶颈:一项综述。
IEEE Access. 2021;9:5497-5516. doi: 10.1109/ACCESS.2020.3047588. Epub 2020 Dec 25.
BMC Bioinformatics. 2017 Oct 13;18(1):452. doi: 10.1186/s12859-017-1852-0.
4
FPGA Implementation of the Coupled Filtering Method and the Affine Warping Method.耦合滤波方法与仿射变形方法的现场可编程门阵列实现
IEEE Trans Nanobioscience. 2017 Jul;16(5):314-325. doi: 10.1109/TNB.2017.2705104. Epub 2017 May 17.
5
MRUniNovo: an efficient tool for de novo peptide sequencing utilizing the hadoop distributed computing framework.MRUniNovo:一种利用Hadoop分布式计算框架进行从头肽测序的高效工具。
Bioinformatics. 2017 Mar 15;33(6):944-946. doi: 10.1093/bioinformatics/btw721.
6
X!TandemPipeline: A Tool to Manage Sequence Redundancy for Protein Inference and Phosphosite Identification.X!串联质谱分析流程:一种用于管理蛋白质推断和磷酸化位点鉴定中序列冗余的工具。
J Proteome Res. 2017 Feb 3;16(2):494-503. doi: 10.1021/acs.jproteome.6b00632. Epub 2016 Dec 19.
7
Nucleophosmin integrates within the nucleolus via multi-modal interactions with proteins displaying R-rich linear motifs and rRNA.核磷蛋白通过与富含精氨酸的线性基序蛋白和核糖体RNA的多模式相互作用整合到核仁中。
Elife. 2016 Feb 2;5:e13571. doi: 10.7554/eLife.13571.
8
Parallel Implementation of MAFFT on CUDA-Enabled Graphics Hardware.MAFFT在支持CUDA的图形硬件上的并行实现。
IEEE/ACM Trans Comput Biol Bioinform. 2015 Jan-Feb;12(1):205-18. doi: 10.1109/TCBB.2014.2351801.
9
Accelerating the Pace of Protein Functional Annotation With Intel Xeon Phi Coprocessors.利用英特尔至强融核协处理器加快蛋白质功能注释的步伐。
IEEE Trans Nanobioscience. 2015 Jun;14(4):429-439. doi: 10.1109/TNB.2015.2403776. Epub 2015 Mar 5.
10
An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database.一种将肽的串联质谱数据与蛋白质数据库中氨基酸序列相关联的方法。
J Am Soc Mass Spectrom. 1994 Nov;5(11):976-89. doi: 10.1016/1044-0305(94)80016-2.