基于主成分分析的无监督特征提取应用于出芽酵母的时间周期性基因表达。

Principal component analysis based unsupervised feature extraction applied to budding yeast temporally periodic gene expression.

作者信息

Taguchi Y-H

机构信息

Department of Physics, Chuo University, 1-13-27 Kasuga, Bunkyo-ku, Tokyo, 112-8551 Japan.

出版信息

BioData Min. 2016 Jun 29;9:22. doi: 10.1186/s13040-016-0101-9. eCollection 2016.

BACKGROUND

The recently proposed principal component analysis (PCA) based unsupervised feature extraction (FE) has successfully been applied to various bioinformatics problems ranging from biomarker identification to the screening of disease causing genes using gene expression/epigenetic profiles. However, the conditions required for its successful use and the mechanisms involved in how it outperforms other supervised methods is unknown, because PCA based unsupervised FE has only been applied to challenging (i.e. not well known) problems.

RESULTS

In this study, PCA based unsupervised FE was applied to an extensively studied organism, i.e., budding yeast. When applied to two gene expression profiles expected to be temporally periodic, yeast metabolic cycle (YMC) and yeast cell division cycle (YCDC), PCA based unsupervised FE outperformed simple but powerful conventional methods, with sinusoidal fitting with regards to several aspects: (i) feasible biological term enrichment without assuming periodicity for YMC; (ii) identification of periodic profiles whose period was half as long as the cell division cycle for YMC; and (iii) the identification of no more than 37 genes associated with the enrichment of biological terms related to cell division cycle for the integrated analysis of seven YCDC profiles, for which sinusoidal fittings failed. The explantation for differences between methods used and the necessary conditions required were determined by comparing PCA based unsupervised FE with fittings to various periodic (artificial, thus pre-defined) profiles. Furthermore, four popular unsupervised clustering algorithms applied to YMC were not as successful as PCA based unsupervised FE.

CONCLUSIONS

PCA based unsupervised FE is a useful and effective unsupervised method to investigate YMC and YCDC. This study identified why the unsupervised method without pre-judged criteria outperformed supervised methods requiring human defined criteria.

背景

最近提出的基于主成分分析（PCA）的无监督特征提取（FE）已成功应用于各种生物信息学问题，从生物标志物识别到使用基因表达/表观遗传谱筛选致病基因。然而，其成功应用所需的条件以及它优于其他监督方法的机制尚不清楚，因为基于PCA的无监督FE仅应用于具有挑战性的（即不太知名的）问题。

结果

在本研究中，基于PCA的无监督FE应用于一种经过广泛研究的生物体，即芽殖酵母。当应用于预期具有时间周期性的两个基因表达谱，即酵母代谢周期（YMC）和酵母细胞分裂周期（YCDC）时，基于PCA的无监督FE在几个方面优于简单但强大的传统方法，即正弦拟合：（i）对于YMC，无需假设周期性即可进行可行的生物学术语富集；（ii）识别周期为YMC细胞分裂周期一半的周期性谱；（iii）对于七个YCDC谱的综合分析，识别与细胞分裂周期相关的生物学术语富集相关的不超过37个基因，而正弦拟合在这些方面失败。通过将基于PCA的无监督FE与各种周期性（人工的，因此是预先定义的）谱的拟合进行比较，确定了所用方法之间差异的解释以及所需的必要条件。此外，应用于YMC的四种流行的无监督聚类算法不如基于PCA的无监督FE成功。