• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用 t-SNE 进行 SNPs 可视化。

Visualization of SNPs with t-SNE.

机构信息

Gregor Mendel Institute, Vienna, Austria.

出版信息

PLoS One. 2013;8(2):e56883. doi: 10.1371/journal.pone.0056883. Epub 2013 Feb 15.

DOI:10.1371/journal.pone.0056883
PMID:23457633
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3574019/
Abstract

BACKGROUND

Single Nucleotide Polymorphisms (SNPs) are one of the largest sources of new data in biology. In most papers, SNPs between individuals are visualized with Principal Component Analysis (PCA), an older method for this purpose.

PRINCIPAL FINDINGS

We compare PCA, an aging method for this purpose, with a newer method, t-Distributed Stochastic Neighbor Embedding (t-SNE) for the visualization of large SNP datasets. We also propose a set of key figures for evaluating these visualizations; in all of these t-SNE performs better.

SIGNIFICANCE

To transform data PCA remains a reasonably good method, but for visualization it should be replaced by a method from the subfield of dimension reduction. To evaluate the performance of visualization, we propose key figures of cross-validation with machine learning methods, as well as indices of cluster validity.

摘要

背景

单核苷酸多态性(SNPs)是生物学中最大的新数据来源之一。在大多数论文中,个体之间的 SNPs 是通过主成分分析(PCA)来可视化的,这是一种用于此目的的较旧方法。

主要发现

我们将 PCA(一种用于此目的的老化方法)与一种较新的方法 t-分布随机邻域嵌入(t-SNE)进行比较,用于可视化大型 SNP 数据集。我们还提出了一组用于评估这些可视化的关键指标;在所有这些指标中,t-SNE 的表现都更好。

意义

要转换数据,PCA 仍然是一种相当不错的方法,但对于可视化,它应该被降维子领域的方法所取代。为了评估可视化的性能,我们提出了使用机器学习方法进行交叉验证的关键指标,以及聚类有效性的指标。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2b9/3574019/353311a1597a/pone.0056883.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2b9/3574019/19850b521bf3/pone.0056883.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2b9/3574019/353311a1597a/pone.0056883.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2b9/3574019/19850b521bf3/pone.0056883.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2b9/3574019/353311a1597a/pone.0056883.g002.jpg

相似文献

1
Visualization of SNPs with t-SNE.使用 t-SNE 进行 SNPs 可视化。
PLoS One. 2013;8(2):e56883. doi: 10.1371/journal.pone.0056883. Epub 2013 Feb 15.
2
Application of t-SNE to human genetic data.t-SNE在人类遗传数据中的应用。
J Bioinform Comput Biol. 2017 Aug;15(4):1750017. doi: 10.1142/S0219720017500172. Epub 2017 Jun 23.
3
Visualization of Single Cell RNA-Seq Data Using t-SNE in R.使用 R 中的 t-SNE 可视化单细胞 RNA-Seq 数据。
Methods Mol Biol. 2020;2117:159-167. doi: 10.1007/978-1-0716-0301-7_8.
4
DGCyTOF: Deep learning with graphic cluster visualization to predict cell types of single cell mass cytometry data.DGCyTOF:基于图形聚类可视化的深度学习,用于预测单细胞质谱流式细胞术数据的细胞类型。
PLoS Comput Biol. 2022 Apr 11;18(4):e1008885. doi: 10.1371/journal.pcbi.1008885. eCollection 2022 Apr.
5
Automated optimized parameters for T-distributed stochastic neighbor embedding improve visualization and analysis of large datasets.自动优化的 T 分布随机近邻嵌入参数可改善大数据集的可视化和分析。
Nat Commun. 2019 Nov 28;10(1):5415. doi: 10.1038/s41467-019-13055-y.
6
Visualizing Single-Cell RNA-seq Data with Semisupervised Principal Component Analysis.基于半监督主成分分析的单细胞 RNA-seq 数据可视化
Int J Mol Sci. 2020 Aug 12;21(16):5797. doi: 10.3390/ijms21165797.
7
Dimensionality reduction by UMAP reinforces sample heterogeneity analysis in bulk transcriptomic data.UMAP 通过降维增强了批量转录组数据中样本异质性分析。
Cell Rep. 2021 Jul 27;36(4):109442. doi: 10.1016/j.celrep.2021.109442.
8
Generalizable and Scalable Visualization of Single-Cell Data Using Neural Networks.基于神经网络的单细胞数据可推广和可扩展的可视化方法。
Cell Syst. 2018 Aug 22;7(2):185-191.e4. doi: 10.1016/j.cels.2018.05.017. Epub 2018 Jun 20.
9
On the Use of -Distributed Stochastic Neighbor Embedding for Data Visualization and Classification of Individuals with Parkinson's Disease.关于使用 - 分布随机邻域嵌入进行帕金森病个体的数据可视化和分类
Comput Math Methods Med. 2018 Nov 4;2018:8019232. doi: 10.1155/2018/8019232. eCollection 2018.
10
Supervised t-distributed stochastic neighbor embedding for data visualization and classification.用于数据可视化和分类的监督式t分布随机邻域嵌入
INFORMS J Comput. 2021 Spring;33(2):419-835. doi: 10.1287/ijoc.2020.0961. Epub 2020 Sep 10.

引用本文的文献

1
Accurate Identification of Native Asian Honey Bee Populations in Jilong (Xizang, China) by Population Genomics and Deep Learning.通过群体基因组学和深度学习准确识别中国西藏吉隆的亚洲本土蜜蜂种群
Insects. 2025 Jul 31;16(8):788. doi: 10.3390/insects16080788.
2
Fungi-Kcr: a language model for predicting lysine crotonylation in pathogenic fungal proteins.真菌Kcr:一种用于预测致病真菌蛋白质中赖氨酸巴豆酰化的语言模型。
Front Cell Infect Microbiol. 2025 Jul 15;15:1615443. doi: 10.3389/fcimb.2025.1615443. eCollection 2025.
3
Feature fusion-enhanced t-SNE image atlas for geophysical features discovery.

本文引用的文献

1
A map of rice genome variation reveals the origin of cultivated rice.一张水稻基因组变异图谱揭示了栽培稻的起源。
Nature. 2012 Oct 25;490(7421):497-501. doi: 10.1038/nature11532. Epub 2012 Oct 3.
2
Coordinating environmental genomics and geochemistry reveals metabolic transitions in a hot spring ecosystem.协调环境基因组学和地球化学揭示了温泉生态系统中的代谢转变。
PLoS One. 2012;7(6):e38108. doi: 10.1371/journal.pone.0038108. Epub 2012 Jun 4.
3
Population structure and linkage disequilibrium in elite barley breeding germplasm from the United States.
用于地球物理特征发现的特征融合增强型t-SNE图像图谱
Sci Rep. 2025 May 17;15(1):17152. doi: 10.1038/s41598-025-01333-3.
4
Research on the improvement method of imbalance of ground penetrating radar image data.探地雷达图像数据不均衡的改进方法研究
Sci Rep. 2025 Jan 22;15(1):2859. doi: 10.1038/s41598-025-87123-3.
5
Refining SARS-CoV-2 intra-host variation by leveraging large-scale sequencing data.利用大规模测序数据优化严重急性呼吸综合征冠状病毒2(SARS-CoV-2)的宿主内变异
NAR Genom Bioinform. 2024 Nov 12;6(4):lqae145. doi: 10.1093/nargab/lqae145. eCollection 2024 Sep.
6
PGRMC2 influences the onset of postmenopausal osteoporosis through disulfidptosis in monocytes: Evidence from experimental validation and Mendelian randomization.PGRMC2通过单核细胞中的二硫键连接性坏死影响绝经后骨质疏松症的发病:来自实验验证和孟德尔随机化的证据
Heliyon. 2024 Aug 19;10(17):e36570. doi: 10.1016/j.heliyon.2024.e36570. eCollection 2024 Sep 15.
7
Population genomics reveals how 5 ka of human occupancy led the Lima leaf-toed gecko (Phyllodactylus sentosus) to the brink of extinction.种群基因组学揭示了人类在利马长达 5000 年的居住史如何导致利马叶趾壁虎(Phyllodactylus sentosus)走向灭绝边缘。
Sci Rep. 2023 Oct 27;13(1):18465. doi: 10.1038/s41598-023-45715-x.
8
Subject-Independent EEG Classification of Motor Imagery Based on Dual-Branch Feature Fusion.基于双分支特征融合的运动想象独立于受试者的脑电图分类
Brain Sci. 2023 Jul 21;13(7):1109. doi: 10.3390/brainsci13071109.
9
Identifying the Molecular Drivers of Pathogenic Aldehyde Dehydrogenase Missense Mutations in Cancer and Non-Cancer Diseases.鉴定癌症和非癌症疾病中致病性醛脱氢酶错义突变的分子驱动因素。
Int J Mol Sci. 2023 Jun 15;24(12):10157. doi: 10.3390/ijms241210157.
10
A cross entropy test allows quantitative statistical comparison of t-SNE and UMAP representations.交叉熵测试允许对 t-SNE 和 UMAP 表示进行定量统计比较。
Cell Rep Methods. 2023 Jan 13;3(1):100390. doi: 10.1016/j.crmeth.2022.100390. eCollection 2023 Jan 23.
美国优良大麦育种种质的群体结构和连锁不平衡。
J Zhejiang Univ Sci B. 2012 Jun;13(6):438-51. doi: 10.1631/jzus.B1200003.
4
Seasonal variations of biochemical, pigment, fatty acid, and sterol compositions in female Crassostrea corteziensis oysters in relation to the reproductive cycle.季节变化对繁殖周期中女太平洋牡蛎生化、色素、脂肪酸和固醇组成的影响。
Comp Biochem Physiol B Biochem Mol Biol. 2012 Oct;163(2):172-83. doi: 10.1016/j.cbpb.2012.05.011. Epub 2012 May 18.
5
Hypoalbuminaemia, systemic albumin leak and endothelial dysfunction in peritoneal dialysis patients.腹膜透析患者的低白蛋白血症、全身白蛋白渗漏和内皮功能障碍。
Nephrol Dial Transplant. 2012 Dec;27(12):4437-45. doi: 10.1093/ndt/gfs075. Epub 2012 Apr 19.
6
Class-imbalanced classifiers for high-dimensional data.高维数据的不平衡分类器。
Brief Bioinform. 2013 Jan;14(1):13-26. doi: 10.1093/bib/bbs006. Epub 2012 Mar 9.
7
Genome-wide patterns of genetic variation in worldwide Arabidopsis thaliana accessions from the RegMap panel.来自 RegMap 面板的全球拟南芥品系的全基因组遗传变异模式。
Nat Genet. 2012 Jan 8;44(2):212-6. doi: 10.1038/ng.1042.
8
Batch effect correction for genome-wide methylation data with Illumina Infinium platform.基于 Illumina Infinium 平台的全基因组甲基化数据的批次效应校正。
BMC Med Genomics. 2011 Dec 16;4:84. doi: 10.1186/1755-8794-4-84.
9
Predicting disease risks from highly imbalanced data using random forest.基于随机森林算法从高度不平衡数据中预测疾病风险。
BMC Med Inform Decis Mak. 2011 Jul 29;11:51. doi: 10.1186/1472-6947-11-51.
10
Selenium and 17 other largely essential and toxic metals in muscle and organ meats of Red Deer (Cervus elaphus)--consequences to human health.肌肉和器官肉中的硒和其他 17 种主要必需和有毒金属——对人类健康的影响。
Environ Int. 2011 Jul;37(5):882-8. doi: 10.1016/j.envint.2011.02.017. Epub 2011 Mar 22.