Suppr超能文献

使用投影分位数回归对低重复质谱数据进行离群值检测。

Outlier detection using projection quantile regression for mass spectrometry data with low replication.

作者信息

Eo Soo-Heang, Pak Daewoo, Choi Jeea, Cho HyungJun

机构信息

Department of Statistics, Korea University, Seoul, Korea.

出版信息

BMC Res Notes. 2012 May 15;5:236. doi: 10.1186/1756-0500-5-236.

Abstract

BACKGROUND

Mass spectrometry (MS) data are often generated from various biological or chemical experiments and there may exist outlying observations, which are extreme due to technical reasons. The determination of outlying observations is important in the analysis of replicated MS data because elaborate pre-processing is essential for successful analysis with reliable results and manual outlier detection as one of pre-processing steps is time-consuming. The heterogeneity of variability and low replication are often obstacles to successful analysis, including outlier detection. Existing approaches, which assume constant variability, can generate many false positives (outliers) and/or false negatives (non-outliers). Thus, a more powerful and accurate approach is needed to account for the heterogeneity of variability and low replication.

FINDINGS

We proposed an outlier detection algorithm using projection and quantile regression in MS data from multiple experiments. The performance of the algorithm and program was demonstrated by using both simulated and real-life data. The projection approach with linear, nonlinear, or nonparametric quantile regression was appropriate in heterogeneous high-throughput data with low replication.

CONCLUSION

Various quantile regression approaches combined with projection were proposed for detecting outliers. The choice among linear, nonlinear, and nonparametric regressions is dependent on the degree of heterogeneity of the data. The proposed approach was illustrated with MS data with two or more replicates.

摘要

背景

质谱(MS)数据通常来自各种生物学或化学实验,可能存在由于技术原因而极端的异常观测值。在重复MS数据的分析中,确定异常观测值很重要,因为精细的预处理对于获得可靠结果的成功分析至关重要,而作为预处理步骤之一的手动异常值检测很耗时。变异性的异质性和低重复性常常是成功分析(包括异常值检测)的障碍。现有的假设变异性恒定的方法会产生许多假阳性(异常值)和/或假阴性(非异常值)。因此,需要一种更强大、更准确的方法来考虑变异性的异质性和低重复性。

研究结果

我们提出了一种在多个实验的MS数据中使用投影和分位数回归的异常值检测算法。通过使用模拟数据和实际数据证明了该算法和程序的性能。具有线性、非线性或非参数分位数回归的投影方法适用于低重复性的异质高通量数据。

结论

提出了各种与投影相结合的分位数回归方法来检测异常值。线性、非线性和非参数回归之间的选择取决于数据的异质性程度。所提出的方法用具有两个或更多重复的MS数据进行了说明。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9694/3514222/008bdd2edce5/1756-0500-5-236-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验