Suppr超能文献

设计实验的主成分分析

Principal component analysis for designed experiments.

作者信息

Konishi Tomokazu

出版信息

BMC Bioinformatics. 2015;16 Suppl 18(Suppl 18):S7. doi: 10.1186/1471-2105-16-S18-S7. Epub 2015 Dec 9.

Abstract

BACKGROUND

Principal component analysis is used to summarize matrix data, such as found in transcriptome, proteome or metabolome and medical examinations, into fewer dimensions by fitting the matrix to orthogonal axes. Although this methodology is frequently used in multivariate analyses, it has disadvantages when applied to experimental data. First, the identified principal components have poor generality; since the size and directions of the components are dependent on the particular data set, the components are valid only within the data set. Second, the method is sensitive to experimental noise and bias between sample groups. It cannot reflect the experimental design that is planned to manage the noise and bias; rather, it estimates the same weight and independence to all the samples in the matrix. Third, the resulting components are often difficult to interpret. To address these issues, several options were introduced to the methodology. First, the principal axes were identified using training data sets and shared across experiments. These training data reflect the design of experiments, and their preparation allows noise to be reduced and group bias to be removed. Second, the center of the rotation was determined in accordance with the experimental design. Third, the resulting components were scaled to unify their size unit.

RESULTS

The effects of these options were observed in microarray experiments, and showed an improvement in the separation of groups and robustness to noise. The range of scaled scores was unaffected by the number of items. Additionally, unknown samples were appropriately classified using pre-arranged axes. Furthermore, these axes well reflected the characteristics of groups in the experiments. As was observed, the scaling of the components and sharing of axes enabled comparisons of the components beyond experiments. The use of training data reduced the effects of noise and bias in the data, facilitating the physical interpretation of the principal axes.

CONCLUSIONS

Together, these introduced options result in improved generality and objectivity of the analytical results. The methodology has thus become more like a set of multiple regression analyses that find independent models that specify each of the axes.

摘要

背景

主成分分析用于通过将矩阵拟合到正交轴,把转录组、蛋白质组、代谢组或医学检查中发现的矩阵数据归纳为较少的维度。尽管这种方法在多变量分析中经常使用,但应用于实验数据时存在缺点。首先,所确定的主成分普遍性较差;由于成分的大小和方向取决于特定数据集,这些成分仅在数据集中有效。其次,该方法对实验噪声和样本组之间的偏差敏感。它无法反映为管理噪声和偏差而设计的实验设计;相反,它对矩阵中的所有样本估计相同的权重和独立性。第三,所得成分往往难以解释。为了解决这些问题,该方法引入了几种选择。首先,使用训练数据集确定主轴并在不同实验中共享。这些训练数据反映了实验设计,其准备工作可减少噪声并消除组间偏差。其次,根据实验设计确定旋转中心。第三,对所得成分进行缩放以统一其大小单位。

结果

在微阵列实验中观察到了这些选择的效果,结果显示组间分离得到改善且对噪声具有鲁棒性。缩放分数的范围不受项目数量的影响。此外,使用预先安排的轴对未知样本进行了适当分类。此外,这些轴很好地反映了实验中各组的特征。如观察到的,成分的缩放和轴的共享使得能够在不同实验之间比较成分。使用训练数据减少了数据中噪声和偏差的影响,便于对主轴进行物理解释。

结论

总之,这些引入的选择提高了分析结果的普遍性和客观性。因此,该方法变得更类似于一组多元回归分析,这些分析找到指定每个轴的独立模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/52c2/4682404/6ac0f8fb84ff/1471-2105-16-S18-S7-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验