用于分析来自多个生物样本的相关质谱数据的多变量两部分统计方法。

Multivariate two-part statistics for analysis of correlated mass spectrometry data from multiple biological specimens.

作者信息

Taylor Sandra L, Ruhaak L Renee, Weiss Robert H, Kelly Karen, Kim Kyoungmi

机构信息

Division of Biostatistics, Department of Public Health Sciences, University of California Davis, CA, 95616, USA.

Department of Clinical Chemistry and Laboratory Medicine, Leiden University Medical Center, Leiden, The Netherlands.

出版信息

Bioinformatics. 2017 Jan 1;33(1):17-25. doi: 10.1093/bioinformatics/btw578. Epub 2016 Sep 4.

DOI:10.1093/bioinformatics/btw578

PMID:27592710

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6075023/

Abstract

MOTIVATION

High through-put mass spectrometry (MS) is now being used to profile small molecular compounds across multiple biological sample types from the same subjects with the goal of leveraging information across biospecimens. Multivariate statistical methods that combine information from all biospecimens could be more powerful than the usual univariate analyses. However, missing values are common in MS data and imputation can impact between-biospecimen correlation and multivariate analysis results.

RESULTS

We propose two multivariate two-part statistics that accommodate missing values and combine data from all biospecimens to identify differentially regulated compounds. Statistical significance is determined using a multivariate permutation null distribution. Relative to univariate tests, the multivariate procedures detected more significant compounds in three biological datasets. In a simulation study, we showed that multi-biospecimen testing procedures were more powerful than single-biospecimen methods when compounds are differentially regulated in multiple biospecimens but univariate methods can be more powerful if compounds are differentially regulated in only one biospecimen.

AVAILABILITY AND IMPLEMENTATION

We provide R functions to implement and illustrate our method as supplementary information CONTACT: sltaylor@ucdavis.eduSupplementary information: Supplementary data are available at Bioinformatics online.

摘要

动机

高通量质谱（MS）现正用于分析来自同一受试者的多种生物样本类型中的小分子化合物，目的是利用跨生物样本的信息。结合所有生物样本信息的多变量统计方法可能比通常的单变量分析更强大。然而，缺失值在质谱数据中很常见，插补会影响生物样本间的相关性和多变量分析结果。

结果

我们提出了两种多变量两部分统计方法，它们能够处理缺失值，并结合所有生物样本的数据来识别差异调节的化合物。使用多变量置换零分布来确定统计显著性。相对于单变量检验，多变量方法在三个生物数据集中检测到了更多显著的化合物。在一项模拟研究中，我们表明当化合物在多个生物样本中差异调节时，多生物样本测试方法比单生物样本方法更强大，但如果化合物仅在一个生物样本中差异调节，单变量方法可能更强大。