SEAOP：一种用于定量蛋白质组学数据中异常值检测的统计集成方法。

SEAOP: a statistical ensemble approach for outlier detection in quantitative proteomics data.

机构信息

College of Information and Electrical Engineering, China Agricultural University, Beijing, 100083, China.

Technology Innovation Center of Mass Spectrometry for State Market Regulation, Center for Advanced Measurement Science, National Institute of Metrology, Beijing 100029, China.

出版信息

Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae129.

DOI:10.1093/bib/bbae129

PMID:38557674

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10982946/

Abstract

Quality control in quantitative proteomics is a persistent challenge, particularly in identifying and managing outliers. Unsupervised learning models, which rely on data structure rather than predefined labels, offer potential solutions. However, without clear labels, their effectiveness might be compromised. Single models are susceptible to the randomness of parameters and initialization, which can result in a high rate of false positives. Ensemble models, on the other hand, have shown capabilities in effectively mitigating the impacts of such randomness and assisting in accurately detecting true outliers. Therefore, we introduced SEAOP, a Python toolbox that utilizes an ensemble mechanism by integrating multi-round data management and a statistics-based decision pipeline with multiple models. Specifically, SEAOP uses multi-round resampling to create diverse sub-data spaces and employs outlier detection methods to identify candidate outliers in each space. Candidates are then aggregated as confirmed outliers via a chi-square test, adhering to a 95% confidence level, to ensure the precision of the unsupervised approaches. Additionally, SEAOP introduces a visualization strategy, specifically designed to intuitively and effectively display the distribution of both outlier and non-outlier samples. Optimal hyperparameter models of SEAOP for outlier detection were identified by using a gradient-simulated standard dataset and Mann-Kendall trend test. The performance of the SEAOP toolbox was evaluated using three experimental datasets, confirming its reliability and accuracy in handling quantitative proteomics.

摘要

定量蛋白质组学中的质量控制是一个持续存在的挑战，特别是在识别和管理离群值方面。无监督学习模型依赖于数据结构而不是预定义的标签，为解决这些问题提供了潜在的解决方案。然而，没有明确的标签，它们的有效性可能会受到影响。单个模型容易受到参数和初始化随机性的影响，这可能导致误报率很高。另一方面，集成模型已经证明了在有效减轻这种随机性的影响和帮助准确检测真实离群值方面的能力。因此，我们引入了 SEAOP，这是一个 Python 工具包，利用集成机制，将多轮数据管理和基于统计的决策管道与多个模型集成在一起。具体来说，SEAOP 使用多轮重采样来创建多样化的子数据空间，并使用异常值检测方法在每个空间中识别候选异常值。然后，通过卡方检验将候选值聚合为确认异常值，置信水平为 95%，以确保无监督方法的精度。此外，SEAOP 引入了一种可视化策略，旨在直观有效地显示异常值和非异常值样本的分布。通过使用梯度模拟标准数据集和曼-肯德尔趋势检验，确定了 SEAOP 用于异常值检测的最佳超参数模型。SEAOP 工具箱的性能通过三个实验数据集进行评估，证实了其在处理定量蛋白质组学方面的可靠性和准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ce1d/10982946/848c4d6559f8/bbae129f1.jpg

相似文献

SEAOP: a statistical ensemble approach for outlier detection in quantitative proteomics data.

Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae129.

Data-driven evolution of water quality models: An in-depth investigation of innovative outlier detection approaches-A case study of Irish Water Quality Index (IEWQI) model.

Water Res. 2024 May 15;255:121499. doi: 10.1016/j.watres.2024.121499. Epub 2024 Mar 20.

Comparison of methods for the detection of outliers and associated biomarkers in mislabeled omics data.

BMC Bioinformatics. 2020 Aug 14;21(1):357. doi: 10.1186/s12859-020-03653-9.

EnsMOD: A Software Program for Omics Sample Outlier Detection.

J Comput Biol. 2023 Jun;30(6):726-735. doi: 10.1089/cmb.2022.0243. Epub 2023 Apr 12.

An Ensemble Outlier Detection Method Based on Information Entropy-Weighted Subspaces for High-Dimensional Data.

Entropy (Basel). 2023 Aug 9;25(8):1185. doi: 10.3390/e25081185.

Outlier detection in spatial error models using modified thresholding-based iterative procedure for outlier detection approach.

BMC Med Res Methodol. 2024 Apr 15;24(1):89. doi: 10.1186/s12874-024-02208-3.

Outlier Detection with Reinforcement Learning for Costly to Verify Data.

Entropy (Basel). 2023 May 25;25(6):842. doi: 10.3390/e25060842.

The utility of multivariate outlier detection techniques for data quality evaluation in large studies: an application within the ONDRI project.

BMC Med Res Methodol. 2019 May 15;19(1):102. doi: 10.1186/s12874-019-0737-5.

STAR_outliers: a python package that separates univariate outliers from non-normal distributions.

BioData Min. 2023 Sep 4;16(1):25. doi: 10.1186/s13040-023-00342-0.

Contributions of ensemble perception to outlier representation precision.

Atten Percept Psychophys. 2021 Apr;83(3):1141-1151. doi: 10.3758/s13414-021-02270-9. Epub 2021 Mar 16.

引用本文的文献

Enhanced Analysis of Low-Abundance Proteins in Soybean Seeds Using Advanced Mass Spectrometry.

Int J Mol Sci. 2025 Jan 23;26(3):949. doi: 10.3390/ijms26030949.

ProteoNet: A CNN-based framework for analyzing proteomics MS-RGB images.

iScience. 2024 Nov 12;27(12):111362. doi: 10.1016/j.isci.2024.111362. eCollection 2024 Dec 20.

本文引用的文献

The Quartet Data Portal: integration of community-wide resources for multiomics quality control.

Genome Biol. 2023 Oct 26;24(1):245. doi: 10.1186/s13059-023-03091-9.

Multi-omics data integration using ratio-based quantitative profiling with Quartet reference materials.

Nat Biotechnol. 2024 Jul;42(7):1133-1149. doi: 10.1038/s41587-023-01934-1. Epub 2023 Sep 7.

Quartet protein reference materials and datasets for multi-platform assessment of label-free proteomics.

Genome Biol. 2023 Sep 7;24(1):202. doi: 10.1186/s13059-023-03048-y.

Proteomic Portrait of Human Lymphoma Reveals Protein Molecular Fingerprint of Disease Specific Subtypes and Progression.

Phenomics. 2022 Dec 12;3(2):148-166. doi: 10.1007/s43657-022-00075-w. eCollection 2023 Apr.

Quality Control-A Stepchild in Quantitative Proteomics: A Case Study for the Human CSF Proteome.

Biomolecules. 2023 Mar 7;13(3):491. doi: 10.3390/biom13030491.

Evolution of Mass Spectrometry Instruments and Techniques for Blood Proteomics.

J Proteome Res. 2023 Apr 7;22(4):1009-1023. doi: 10.1021/acs.jproteome.3c00102. Epub 2023 Mar 18.

: A Quality Control, Visualization, and Statistics Pipeline for Multiple Omics Datatypes.

J Proteome Res. 2023 Feb 3;22(2):570-576. doi: 10.1021/acs.jproteome.2c00610. Epub 2023 Jan 9.

iProX in 2021: connecting proteomics data sharing with big data.

Nucleic Acids Res. 2022 Jan 7;50(D1):D1522-D1527. doi: 10.1093/nar/gkab1081.

Artificial intelligence for proteomics and biomarker discovery.

Cell Syst. 2021 Aug 18;12(8):759-770. doi: 10.1016/j.cels.2021.06.006.

POMAShiny: A user-friendly web-based workflow for metabolomics and proteomics data analysis.

PLoS Comput Biol. 2021 Jul 1;17(7):e1009148. doi: 10.1371/journal.pcbi.1009148. eCollection 2021 Jul.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

SEAOP：一种用于定量蛋白质组学数据中异常值检测的统计集成方法。

SEAOP: a statistical ensemble approach for outlier detection in quantitative proteomics data.

机构信息

College of Information and Electrical Engineering, China Agricultural University, Beijing, 100083, China.

Technology Innovation Center of Mass Spectrometry for State Market Regulation, Center for Advanced Measurement Science, National Institute of Metrology, Beijing 100029, China.

出版信息

Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae129.

DOI:10.1093/bib/bbae129

PMID:38557674

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10982946/

Abstract

摘要

SEAOP：一种用于定量蛋白质组学数据中异常值检测的统计集成方法。

SEAOP: a statistical ensemble approach for outlier detection in quantitative proteomics data.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

SEAOP：一种用于定量蛋白质组学数据中异常值检测的统计集成方法。

SEAOP: a statistical ensemble approach for outlier detection in quantitative proteomics data.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献