Li Wentian, Cerise Jane E, Yang Yaning, Han Henry
* Robert S Boas Center for Genomics and Human Genetics, Feinstein Institute for Medical Research, Northwell Health, Manhasset, NY 11030, USA.
† Department of Statistics and Finance, University of Science and Technology of China, Hefei, Anhui, China.
J Bioinform Comput Biol. 2017 Aug;15(4):1750017. doi: 10.1142/S0219720017500172. Epub 2017 Jun 23.
The t-distributed stochastic neighbor embedding t-SNE is a new dimension reduction and visualization technique for high-dimensional data. t-SNE is rarely applied to human genetic data, even though it is commonly used in other data-intensive biological fields, such as single-cell genomics. We explore the applicability of t-SNE to human genetic data and make these observations: (i) similar to previously used dimension reduction techniques such as principal component analysis (PCA), t-SNE is able to separate samples from different continents; (ii) unlike PCA, t-SNE is more robust with respect to the presence of outliers; (iii) t-SNE is able to display both continental and sub-continental patterns in a single plot. We conclude that the ability for t-SNE to reveal population stratification at different scales could be useful for human genetic association studies.
t分布随机邻域嵌入(t-SNE)是一种用于高维数据的新型降维和可视化技术。尽管t-SNE在其他数据密集型生物学领域(如单细胞基因组学)中普遍使用,但很少应用于人类遗传数据。我们探讨了t-SNE在人类遗传数据中的适用性,并得出以下观察结果:(i)与之前使用的降维技术(如主成分分析(PCA))类似,t-SNE能够区分来自不同大陆的样本;(ii)与PCA不同,t-SNE在存在异常值的情况下更稳健;(iii)t-SNE能够在单个图中显示大陆和次大陆模式。我们得出结论,t-SNE在不同尺度上揭示群体分层的能力可能对人类遗传关联研究有用。