Alanis-Lobato Gregorio, Cannistraci Carlo Vittorio, Eriksson Anders, Manica Andrea, Ravasi Timothy
1] Integrative Systems Biology Laboratory, Biological and Environmental Sciences and Engineering Division, Computer, Electrical and Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Ibn Al Haytham Bldg. 2, Level 4, Thuwal 23955-6900, Kingdom of Saudi Arabia [2] Division of Medical Genetics, Department of Medicine, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093 USA.
Biomedical Cybernetics Group, Biotechnology Center (BIOTEC), Technische Universität Dresden, Tatzberg 47/49, 01307 Dresden, Germany.
Sci Rep. 2015 Jan 30;5:8140. doi: 10.1038/srep08140.
Detecting structure in population genetics and case-control studies is important, as it exposes phenomena such as ecoclines, admixture and stratification. Principal Component Analysis (PCA) is a linear dimension-reduction technique commonly used for this purpose, but it struggles to reveal complex, nonlinear data patterns. In this paper we introduce non-centred Minimum Curvilinear Embedding (ncMCE), a nonlinear method to overcome this problem. Our analyses show that ncMCE can separate individuals into ethnic groups in cases in which PCA fails to reveal any clear structure. This increased discrimination power arises from ncMCE's ability to better capture the phylogenetic signal in the samples, whereas PCA better reflects their geographic relation. We also demonstrate how ncMCE can discover interesting patterns, even when the data has been poorly pre-processed. The juxtaposition of PCA and ncMCE visualisations provides a new standard of analysis with utility for discovering and validating significant linear/nonlinear complementary patterns in genetic data.
在群体遗传学和病例对照研究中检测结构很重要,因为它能揭示生态渐变群、混合和分层等现象。主成分分析(PCA)是一种常用于此目的的线性降维技术,但它难以揭示复杂的非线性数据模式。在本文中,我们介绍了非中心最小曲线嵌入(ncMCE),这是一种克服此问题的非线性方法。我们的分析表明,在PCA无法揭示任何清晰结构的情况下,ncMCE可以将个体分为不同种族群体。这种更强的区分能力源于ncMCE能够更好地捕捉样本中的系统发育信号,而PCA则更好地反映它们的地理关系。我们还展示了即使数据预处理不佳,ncMCE也能发现有趣的模式。PCA和ncMCE可视化的并列提供了一种新的分析标准,有助于发现和验证遗传数据中显著的线性/非线性互补模式。