Suppr超能文献

关于违反具有多个解释变量的化学“组学”数据降维统计假设的影响的见解。

Insights into the Effects of Violating Statistical Assumptions for Dimensionality Reduction for Chemical "-omics" Data with Multiple Explanatory Variables.

作者信息

Brown Amber O, Green Peter J, Frankham Greta J, Stuart Barbara H, Ueland Maiken

机构信息

Australian Museum Research Institute, Australian Museum, Sydney 2001, NSW, Australia.

Centre for Forensic Science, University of Technology Sydney, Ultimo 2007, NSW, Australia.

出版信息

ACS Omega. 2023 Jun 9;8(24):22042-22054. doi: 10.1021/acsomega.3c01613. eCollection 2023 Jun 20.

Abstract

Biological volatilome analysis is inherently complex due to the considerable number of compounds (i.e., dimensions) and differences in peak areas by orders of magnitude, between and within compounds found within datasets. Traditional volatilome analysis relies on dimensionality reduction techniques which aid in the selection of compounds that are considered relevant to respective research questions prior to further analysis. Currently, compounds of interest are identified using either supervised or unsupervised statistical methods which assume the data residuals are normally distributed and exhibit linearity. However, biological data often violate the statistical assumptions of these models related to normality and the presence of multiple explanatory variables which are innate to biological samples. In an attempt to address deviations from normality, volatilome data can be log transformed. However, whether the effects of each assessed variable are additive or multiplicative should be considered prior to transformation, as this will impact the effect of each variable on the data. If assumptions of normality and variable effects are not investigated prior to dimensionality reduction, ineffective or erroneous compound dimensionality reduction can impact downstream analyses. It is the aim of this manuscript to assess the impact of single and multivariable statistical models with and without the log transformation to volatilome dimensionality reduction prior to any supervised or unsupervised classification analysis. As a proof of concept, Shingleback lizard () volatilomes were collected across their species distribution and from captivity and were assessed. Shingleback volatilomes are suspected to be influenced by multiple explanatory variables related to habitat (Bioregion), sex, parasite presence, total body volume, and captive status. This work determined that the exclusion of relevant multiple explanatory variables from analysis overestimates the effect of Bioregion and the identification of significant compounds. The log transformation increased the number of compounds that were identified as significant, as did analyses that assumed that residuals were normally distributed. Among the methods considered in this work, the most conservative form of dimensionality reduction was achieved through analyzing untransformed data using Monte Carlo tests with multiple explanatory variables.

摘要

由于数据集中发现的化合物数量众多(即维度)以及化合物之间和内部峰面积在数量级上的差异,生物挥发物组分析本质上很复杂。传统的挥发物组分析依赖于降维技术,这些技术有助于在进一步分析之前选择与各自研究问题相关的化合物。目前,使用有监督或无监督统计方法来识别感兴趣的化合物,这些方法假定数据残差呈正态分布且具有线性关系。然而,生物数据常常违反这些模型与正态性以及生物样本固有的多个解释变量相关的统计假设。为了应对偏离正态性的情况,可以对挥发物组数据进行对数转换。然而,在转换之前应考虑每个评估变量的影响是相加的还是相乘的,因为这将影响每个变量对数据的作用。如果在降维之前不研究正态性和变量影响的假设,无效或错误的化合物降维可能会影响下游分析。本手稿的目的是评估在进行任何有监督或无监督分类分析之前,单变量和多变量统计模型(有无对数转换)对挥发物组降维的影响。作为概念验证,收集了细纹蓝舌石龙子()在其物种分布范围内以及圈养环境中的挥发物组并进行了评估。细纹蓝舌石龙子的挥发物组被怀疑受到与栖息地(生物区域)、性别、寄生虫存在、总体积和圈养状态相关的多个解释变量的影响。这项工作确定,在分析中排除相关的多个解释变量会高估生物区域的影响和显著化合物的识别。对数转换增加了被确定为显著的化合物数量,假设残差呈正态分布的分析也是如此。在这项工作中考虑的方法中,最保守的降维形式是通过使用具有多个解释变量的蒙特卡罗检验分析未转换的数据来实现的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b8a/10286096/f8bbeb6ebe6d/ao3c01613_0002.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验