• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于生物质谱分析的不断发展的计算平台:使用MASSyPup64的工作流程、统计学和数据挖掘

An evolving computational platform for biological mass spectrometry: workflows, statistics and data mining with MASSyPup64.

作者信息

Winkler Robert

机构信息

Department of Biotechnology and Biochemistry, CINVESTAV Unidad Irapuato , Mexico.

出版信息

PeerJ. 2015 Nov 17;3:e1401. doi: 10.7717/peerj.1401. eCollection 2015.

DOI:10.7717/peerj.1401
PMID:26618079
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4655102/
Abstract

In biological mass spectrometry, crude instrumental data need to be converted into meaningful theoretical models. Several data processing and data evaluation steps are required to come to the final results. These operations are often difficult to reproduce, because of too specific computing platforms. This effect, known as 'workflow decay', can be diminished by using a standardized informatic infrastructure. Thus, we compiled an integrated platform, which contains ready-to-use tools and workflows for mass spectrometry data analysis. Apart from general unit operations, such as peak picking and identification of proteins and metabolites, we put a strong emphasis on the statistical validation of results and Data Mining. MASSyPup64 includes e.g., the OpenMS/TOPPAS framework, the Trans-Proteomic-Pipeline programs, the ProteoWizard tools, X!Tandem, Comet and SpiderMass. The statistical computing language R is installed with packages for MS data analyses, such as XCMS/metaXCMS and MetabR. The R package Rattle provides a user-friendly access to multiple Data Mining methods. Further, we added the non-conventional spreadsheet program teapot for editing large data sets and a command line tool for transposing large matrices. Individual programs, console commands and modules can be integrated using the Workflow Management System (WMS) taverna. We explain the useful combination of the tools by practical examples: (1) A workflow for protein identification and validation, with subsequent Association Analysis of peptides, (2) Cluster analysis and Data Mining in targeted Metabolomics, and (3) Raw data processing, Data Mining and identification of metabolites in untargeted Metabolomics. Association Analyses reveal relationships between variables across different sample sets. We present its application for finding co-occurring peptides, which can be used for target proteomics, the discovery of alternative biomarkers and protein-protein interactions. Data Mining derived models displayed a higher robustness and accuracy for classifying sample groups in targeted Metabolomics than cluster analyses. Random Forest models do not only provide predictive models, which can be deployed for new data sets, but also the variable importance. We demonstrate that the later is especially useful for tracking down significant signals and affected pathways in untargeted Metabolomics. Thus, Random Forest modeling supports the unbiased search for relevant biological features in Metabolomics. Our results clearly manifest the importance of Data Mining methods to disclose non-obvious information in biological mass spectrometry . The application of a Workflow Management System and the integration of all required programs and data in a consistent platform makes the presented data analyses strategies reproducible for non-expert users. The simple remastering process and the Open Source licenses of MASSyPup64 (http://www.bioprocess.org/massypup/) enable the continuous improvement of the system.

摘要

在生物质谱分析中,原始仪器数据需要转换为有意义的理论模型。为了得到最终结果,需要进行几个数据处理和数据评估步骤。由于计算平台过于特定,这些操作往往难以重现。这种被称为“工作流程衰退”的效应可以通过使用标准化的信息基础设施来减轻。因此,我们编制了一个集成平台,其中包含用于质谱数据分析的即用型工具和工作流程。除了一般的单元操作,如峰检测以及蛋白质和代谢物的鉴定外,我们还非常强调结果的统计验证和数据挖掘。MASSyPup64包括例如OpenMS/TOPPAS框架、跨蛋白质组学管道程序、ProteoWizard工具、X!Tandem、Comet和SpiderMass。统计计算语言R安装了用于质谱数据分析的包,如XCMS/metaXCMS和MetabR。R包Rattle提供了对多种数据挖掘方法的用户友好访问。此外,我们添加了用于编辑大型数据集的非传统电子表格程序teapot和用于转置大型矩阵的命令行工具。可以使用工作流管理系统(WMS)taverna集成各个程序、控制台命令和模块。我们通过实际示例解释这些工具的有用组合:(1)用于蛋白质鉴定和验证以及随后肽段关联分析的工作流程,(2)靶向代谢组学中的聚类分析和数据挖掘,以及(3)非靶向代谢组学中的原始数据处理、数据挖掘和代谢物鉴定。关联分析揭示了不同样本集之间变量的关系。我们展示了其在寻找共现肽段方面的应用,这些肽段可用于靶向蛋白质组学、发现替代生物标志物以及蛋白质 - 蛋白质相互作用。在靶向代谢组学中,数据挖掘衍生的模型在对样本组进行分类时显示出比聚类分析更高的稳健性和准确性。随机森林模型不仅提供可用于新数据集的预测模型,还提供变量重要性。我们证明,后者对于在非靶向代谢组学中追踪显著信号和受影响的途径特别有用。因此,随机森林建模支持在代谢组学中无偏地搜索相关生物学特征。我们的结果清楚地表明了数据挖掘方法在生物质谱分析中揭示非明显信息的重要性。工作流管理系统的应用以及将所有所需程序和数据集成在一个一致的平台上,使得所提出的数据分析策略对于非专业用户来说是可重现的。MASSyPup64简单的重新制作过程和开源许可(http://www.bioprocess.org/massypup/)使系统能够持续改进。

相似文献

1
An evolving computational platform for biological mass spectrometry: workflows, statistics and data mining with MASSyPup64.用于生物质谱分析的不断发展的计算平台:使用MASSyPup64的工作流程、统计学和数据挖掘
PeerJ. 2015 Nov 17;3:e1401. doi: 10.7717/peerj.1401. eCollection 2015.
2
IP4M: an integrated platform for mass spectrometry-based metabolomics data mining.IP4M:基于质谱的代谢组学数据挖掘的集成平台。
BMC Bioinformatics. 2020 Oct 7;21(1):444. doi: 10.1186/s12859-020-03786-x.
3
Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification头部损伤的转化代谢组学:基于体外核磁共振波谱的代谢物定量分析探索脑代谢功能障碍
4
TOPPAS: a graphical workflow editor for the analysis of high-throughput proteomics data.TOPPAS:一个用于分析高通量蛋白质组学数据的图形化工作流编辑器。
J Proteome Res. 2012 Jul 6;11(7):3914-20. doi: 10.1021/pr300187f. Epub 2012 May 24.
5
Galaxy-M: a Galaxy workflow for processing and analyzing direct infusion and liquid chromatography mass spectrometry-based metabolomics data.Galaxy-M:一种用于处理和分析基于直接进样和液相色谱质谱联用的代谢组学数据的Galaxy工作流程。
Gigascience. 2016 Feb 23;5:10. doi: 10.1186/s13742-016-0115-8. eCollection 2016.
6
Workflows for automated downstream data analysis and visualization in large-scale computational mass spectrometry.大规模计算质谱中自动化下游数据分析与可视化的工作流程。
Proteomics. 2015 Apr;15(8):1443-7. doi: 10.1002/pmic.201400391. Epub 2015 Feb 14.
7
AlpsNMR: an R package for signal processing of fully untargeted NMR-based metabolomics.AlpsNMR:用于基于 NMR 的代谢组学全非靶向信号处理的 R 包。
Bioinformatics. 2020 May 1;36(9):2943-2945. doi: 10.1093/bioinformatics/btaa022.
8
Comparison of peak-picking workflows for untargeted liquid chromatography/high-resolution mass spectrometry metabolomics data analysis.非靶向液相色谱/高分辨率质谱代谢组学数据分析中峰挑选工作流程的比较
Rapid Commun Mass Spectrom. 2015 Jan 15;29(1):119-27. doi: 10.1002/rcm.7094.
9
APP: an Automated Proteomics Pipeline for the analysis of mass spectrometry data based on multiple open access tools.APP:一种基于多个开放获取工具的用于质谱数据分析的自动化蛋白质组学流程。
BMC Bioinformatics. 2014 Dec 30;15(1):441. doi: 10.1186/s12859-014-0441-8.
10
OpenMS: a flexible open-source software platform for mass spectrometry data analysis.OpenMS:一个灵活的开源质谱数据分析软件平台。
Nat Methods. 2016 Aug 30;13(9):741-8. doi: 10.1038/nmeth.3959.

引用本文的文献

1
Identification of Plant Compounds with Mass Spectrometry Imaging (MSI).利用质谱成像(MSI)鉴定植物化合物
Metabolites. 2024 Jul 30;14(8):419. doi: 10.3390/metabo14080419.
2
Technical Note: mzML and imzML Libraries for Processing Mass Spectrometry Data with the High-Performance Programming Language Julia.技术说明:用于使用高性能编程语言Julia处理质谱数据的mzML和imzML库。
Anal Chem. 2024 Mar 12;96(10):3999-4004. doi: 10.1021/acs.analchem.3c05853. Epub 2024 Mar 1.
3
A diagnostic model for overweight and obesity from untargeted urine metabolomics of soldiers.

本文引用的文献

1
Identification of B6T173 (ZmPrx35) as the prevailing peroxidase in highly insect-resistant maize (Zea mays, p84C3) kernels by activity-directed purification.通过活性导向纯化鉴定B6T173(ZmPrx35)为高抗虫玉米(玉米,p84C3)籽粒中主要的过氧化物酶
Front Plant Sci. 2015 Aug 31;6:670. doi: 10.3389/fpls.2015.00670. eCollection 2015.
2
Metabolic chemotypes of CITES protected Dalbergia timbers from Africa, Madagascar, and Asia.《濒危野生动植物种国际贸易公约》保护的来自非洲、马达加斯加和亚洲的黄檀属木材的代谢化学型
Rapid Commun Mass Spectrom. 2015 May 15;29(9):783-8. doi: 10.1002/rcm.7163.
3
A High Throughput Ambient Mass Spectrometric Approach to Species Identification and Classification from Chemical Fingerprint Signatures.
士兵非靶向尿液代谢组学预测超重和肥胖的诊断模型。
PeerJ. 2022 Jul 22;10:e13754. doi: 10.7717/peerj.13754. eCollection 2022.
4
Contrast optimization of mass spectrometry imaging (MSI) data visualization by threshold intensity quantization (TrIQ).通过阈值强度量化(TrIQ)对质谱成像(MSI)数据可视化进行对比度优化。
PeerJ Comput Sci. 2021 Jun 9;7:e585. doi: 10.7717/peerj-cs.585. eCollection 2021.
5
Plasma protein adsorption on FeO-PEG nanoparticles activates the complement system and induces an inflammatory response.载氧化铁-聚乙二醇纳米颗粒的血浆蛋白吸附激活补体系统并诱导炎症反应。
Int J Nanomedicine. 2019 Mar 25;14:2055-2067. doi: 10.2147/IJN.S192214. eCollection 2019.
6
Advances in metabolome information retrieval: turning chemistry into biology. Part II: biological information recovery.代谢组学信息检索的进展:化学生物学。第二部分:生物信息恢复。
J Inherit Metab Dis. 2018 May;41(3):393-406. doi: 10.1007/s10545-017-0080-0. Epub 2017 Aug 25.
7
Genomic history of the origin and domestication of common bean unveils its closest sister species.菜豆起源和驯化的基因组历史揭示了其最亲近的姐妹物种。
Genome Biol. 2017 Mar 29;18(1):60. doi: 10.1186/s13059-017-1190-6.
8
Structural Basis for Redox Regulation of Cytoplasmic and Chloroplastic Triosephosphate Isomerases from .来自……的细胞质和叶绿体磷酸丙糖异构酶氧化还原调节的结构基础
Front Plant Sci. 2016 Dec 6;7:1817. doi: 10.3389/fpls.2016.01817. eCollection 2016.
9
Clinical Metabolomics: The New Metabolic Window for Inborn Errors of Metabolism Investigations in the Post-Genomic Era.临床代谢组学:后基因组时代先天性代谢缺陷研究的新代谢窗口
Int J Mol Sci. 2016 Jul 20;17(7):1167. doi: 10.3390/ijms17071167.
10
Popper and the Omics.波普尔与组学
Front Plant Sci. 2016 Feb 19;7:195. doi: 10.3389/fpls.2016.00195. eCollection 2016.
一种基于化学指纹图谱进行物种鉴定和分类的高通量常压质谱方法。
Sci Rep. 2015 Jul 9;5:11520. doi: 10.1038/srep11520.
4
A deeper look into Comet--implementation and features.深入探究Comet——实现与特性
J Am Soc Mass Spectrom. 2015 Nov;26(11):1865-74. doi: 10.1007/s13361-015-1179-x. Epub 2015 Jun 27.
5
Metabolic fingerprinting of Arabidopsis thaliana accessions.拟南芥生态型的代谢指纹分析
Front Plant Sci. 2015 May 27;6:365. doi: 10.3389/fpls.2015.00365. eCollection 2015.
6
MSI.R scripts reveal volatile and semi-volatile features in low-temperature plasma mass spectrometry imaging (LTP-MSI) of chilli (Capsicum annuum).MSI.R脚本揭示了辣椒(辣椒属)低温等离子体质谱成像(LTP-MSI)中的挥发性和半挥发性特征。
Anal Bioanal Chem. 2015 Jul;407(19):5673-84. doi: 10.1007/s00216-015-8744-9. Epub 2015 May 26.
7
Introducing the PRIDE Archive RESTful web services.介绍PRIDE存档的RESTful网络服务。
Nucleic Acids Res. 2015 Jul 1;43(W1):W599-604. doi: 10.1093/nar/gkv382. Epub 2015 Apr 22.
8
SpiderMass: Semantic database creation and tripartite metabolite identification strategy.蜘蛛质谱:语义数据库创建与三方代谢物鉴定策略
J Mass Spectrom. 2015 Mar;50(3):538-41. doi: 10.1002/jms.3559.
9
Trans-Proteomic Pipeline, a standardized data processing pipeline for large-scale reproducible proteomics informatics.跨蛋白质组学管道,一种用于大规模可重复蛋白质组学信息学的标准化数据处理管道。
Proteomics Clin Appl. 2015 Aug;9(7-8):745-54. doi: 10.1002/prca.201400164. Epub 2015 Apr 2.
10
Metabolic profiling of plant extracts using direct-injection electrospray ionization mass spectrometry allows for high-throughput phenotypic characterization according to genetic and environmental effects.使用直接进样电喷雾电离质谱法对植物提取物进行代谢谱分析,可根据遗传和环境效应进行高通量表型特征分析。
J Agric Food Chem. 2015 Jan 28;63(3):1042-52. doi: 10.1021/jf504853w. Epub 2015 Jan 14.