Suppr超能文献

EnsMOD:一种用于组学样本离群值检测的软件程序。

EnsMOD: A Software Program for Omics Sample Outlier Detection.

机构信息

Laboratory of Immune System Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland, USA.

出版信息

J Comput Biol. 2023 Jun;30(6):726-735. doi: 10.1089/cmb.2022.0243. Epub 2023 Apr 12.

Abstract

Detection of omics sample outliers is important for preventing erroneous biological conclusions, developing robust experimental protocols, and discovering rare biological states. Two recent publications describe robust algorithms for detecting transcriptomic sample outliers, but neither algorithm had been incorporated into a software tool for scientists. Here we describe Ensemble Methods for Outlier Detection (EnsMOD) which incorporates both algorithms. EnsMOD calculates how closely the quantitation variation follows a normal distribution, plots the density curves of each sample to visualize anomalies, performs hierarchical cluster analyses to calculate how closely the samples cluster with each other, and performs robust principal component analyses to statistically test if any sample is an outlier. The probabilistic threshold parameters can be easily adjusted to tighten or loosen the outlier detection stringency. EnsMOD can be used to analyze any omics dataset with normally distributed variance. Here it was used to analyze a simulated proteomics dataset, a multiomic (proteome and transcriptome) dataset, a single-cell proteomics dataset, and a phosphoproteomics dataset. EnsMOD successfully identified all of the simulated outliers, and subsequent removal of a detected outlier improved data quality for downstream statistical analyses.

摘要

检测组学样本离群值对于防止错误的生物学结论、开发稳健的实验方案以及发现罕见的生物学状态非常重要。最近有两篇文献描述了用于检测转录组样本离群值的稳健算法,但这两种算法都没有被纳入科学家使用的软件工具中。在这里,我们描述了集成方法用于离群值检测(EnsMOD),它整合了这两种算法。EnsMOD 计算定量变化如何接近正态分布,绘制每个样本的密度曲线以可视化异常值,进行层次聚类分析以计算样本之间的聚类程度,以及进行稳健主成分分析以统计检验是否有任何样本是离群值。概率阈值参数可以轻松调整,以收紧或放宽离群值检测的严格性。EnsMOD 可用于分析具有正态分布方差的任何组学数据集。在这里,它被用于分析模拟的蛋白质组学数据集、多组学(蛋白质组和转录组)数据集、单细胞蛋白质组学数据集和磷酸化蛋白质组学数据集。EnsMOD 成功地识别了所有模拟的离群值,并且随后去除一个检测到的离群值提高了下游统计分析的数据质量。

相似文献

1
EnsMOD: A Software Program for Omics Sample Outlier Detection.
J Comput Biol. 2023 Jun;30(6):726-735. doi: 10.1089/cmb.2022.0243. Epub 2023 Apr 12.
2
Robust principal component analysis for accurate outlier sample detection in RNA-Seq data.
BMC Bioinformatics. 2020 Jun 29;21(1):269. doi: 10.1186/s12859-020-03608-0.
3
STAR_outliers: a python package that separates univariate outliers from non-normal distributions.
BioData Min. 2023 Sep 4;16(1):25. doi: 10.1186/s13040-023-00342-0.
4
5
Comparison of methods for the detection of outliers and associated biomarkers in mislabeled omics data.
BMC Bioinformatics. 2020 Aug 14;21(1):357. doi: 10.1186/s12859-020-03653-9.
6
Entropy-based grid approach for handling outliers: a case study to environmental monitoring data.
Environ Sci Pollut Res Int. 2023 Dec;30(60):125138-125157. doi: 10.1007/s11356-023-26780-1. Epub 2023 Jun 12.
7
Detecting outlier samples in microarray data.
Stat Appl Genet Mol Biol. 2009;8:Article 13. doi: 10.2202/1544-6115.1426. Epub 2009 Feb 11.
8
Detecting EEG outliers for BCI on the Riemannian manifold using spectral clustering.
Annu Int Conf IEEE Eng Med Biol Soc. 2020 Jul;2020:438-441. doi: 10.1109/EMBC44109.2020.9175456.
9
QuanTP: A Software Resource for Quantitative Proteo-Transcriptomic Comparative Data Analysis and Informatics.
J Proteome Res. 2019 Feb 1;18(2):782-790. doi: 10.1021/acs.jproteome.8b00727. Epub 2019 Jan 4.
10
ROSIE: RObust Sparse ensemble for outlIEr detection and gene selection in cancer omics data.
Stat Methods Med Res. 2022 May;31(5):947-958. doi: 10.1177/09622802211072456. Epub 2022 Jan 24.

引用本文的文献

1
The Associations of Air Pollution Mixture Exposure with Plasma Proteins in an Elderly U.S. Panel.
Environ Sci Technol. 2025 Aug 5;59(30):15692-15704. doi: 10.1021/acs.est.5c03052. Epub 2025 Jul 24.

本文引用的文献

1
Single-cell proteomic and transcriptomic analysis of macrophage heterogeneity using SCoPE2.
Genome Biol. 2021 Jan 27;22(1):50. doi: 10.1186/s13059-021-02267-5.
2
FEATS: feature selection-based clustering of single-cell RNA-seq data.
Brief Bioinform. 2021 Jul 20;22(4). doi: 10.1093/bib/bbaa306.
3
Comparison of methods for the detection of outliers and associated biomarkers in mislabeled omics data.
BMC Bioinformatics. 2020 Aug 14;21(1):357. doi: 10.1186/s12859-020-03653-9.
4
Robust principal component analysis for accurate outlier sample detection in RNA-Seq data.
BMC Bioinformatics. 2020 Jun 29;21(1):269. doi: 10.1186/s12859-020-03608-0.
5
MetaCell: analysis of single-cell RNA-seq data using K-nn graph partitions.
Genome Biol. 2019 Oct 11;20(1):206. doi: 10.1186/s13059-019-1812-2.
6
scRNABatchQC: multi-samples quality control for single cell RNA-seq data.
Bioinformatics. 2019 Dec 15;35(24):5306-5308. doi: 10.1093/bioinformatics/btz601.
7
Lineage Inference and Stem Cell Identity Prediction Using Single-Cell RNA-Sequencing Data.
Methods Mol Biol. 2019;1975:277-301. doi: 10.1007/978-1-4939-9224-9_13.
8
scPipe: A flexible R/Bioconductor preprocessing pipeline for single-cell RNA-sequencing data.
PLoS Comput Biol. 2018 Aug 10;14(8):e1006361. doi: 10.1371/journal.pcbi.1006361. eCollection 2018 Aug.
9
Ensemble outlier detection and gene selection in triple-negative breast cancer data.
BMC Bioinformatics. 2018 May 4;19(1):168. doi: 10.1186/s12859-018-2149-7.
10
An accurate and robust imputation method scImpute for single-cell RNA-seq data.
Nat Commun. 2018 Mar 8;9(1):997. doi: 10.1038/s41467-018-03405-7.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验