• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

主成分分析图的信息重缩放及其在遗传距离中的应用。

Informational rescaling of PCA maps with application to genetic distance.

作者信息

Taleb Nassim Nicholas, Zalloua Pierre, Elbassioni Khaled, Hatzikirou Haralampos, Henschel Andreas, Platt Daniel E

机构信息

Risk Engineering, School of Engineering, New York, USA.

Maroun Semaan Faculty of Engineering and Architecture, American University of Beirut, Beirut, Lebanon.

出版信息

Comput Struct Biotechnol J. 2024 Dec 11;27:48-56. doi: 10.1016/j.csbj.2024.11.042. eCollection 2025.

DOI:10.1016/j.csbj.2024.11.042
PMID:39802212
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11719279/
Abstract

Principal Component Analysis (PCA) is a powerful multivariate tool allowing the projection of data in low-dimensional representations. Nevertheless, datapoint distances on these low-dimensional projections are challenging to interpret. Here, we propose a computationally simple heuristic to transform a map based on standard PCA (when the variables are asymptotically Gaussian) into an entropy-based map where distances are based on mutual information (MI). Moreover, we show that in certain instances our proposed scaled PCA can improve cluster identification. Rescaling principal component-based distances using MI results in a representation of relative statistical associations when, as in genetics, it is applied on bit measurements between individuals' genomic mutual information. This entropy-rescaled PCA, while preserving order relationships (along a dimension), quantifies relative distances into information units, such as "bits". We illustrate the effect of this rescaling using genomics data derived from world populations and describe how the interpretation of results is impacted.

摘要

主成分分析(PCA)是一种强大的多变量工具,可将数据投影到低维表示中。然而,这些低维投影上的数据点距离难以解释。在此,我们提出一种计算简单的启发式方法,将基于标准PCA(当变量渐近高斯分布时)的映射转换为基于熵的映射,其中距离基于互信息(MI)。此外,我们表明在某些情况下,我们提出的缩放PCA可以改善聚类识别。当像在遗传学中那样将基于主成分的距离用MI重新缩放时,会得到相对统计关联的表示,这是应用于个体基因组互信息之间的比特测量时的情况。这种熵重新缩放的PCA在保留顺序关系(沿一个维度)的同时,将相对距离量化为信息单位,如“比特”。我们使用来自世界人群的基因组数据说明了这种重新缩放的效果,并描述了结果的解释是如何受到影响的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f06d/11719279/7038ff870e79/gr005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f06d/11719279/4e167d69169d/gr001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f06d/11719279/0b6f110a7b30/gr002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f06d/11719279/92191871db5e/gr003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f06d/11719279/f2f0486bbd5a/gr004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f06d/11719279/7038ff870e79/gr005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f06d/11719279/4e167d69169d/gr001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f06d/11719279/0b6f110a7b30/gr002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f06d/11719279/92191871db5e/gr003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f06d/11719279/f2f0486bbd5a/gr004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f06d/11719279/7038ff870e79/gr005.jpg

相似文献

1
Informational rescaling of PCA maps with application to genetic distance.主成分分析图的信息重缩放及其在遗传距离中的应用。
Comput Struct Biotechnol J. 2024 Dec 11;27:48-56. doi: 10.1016/j.csbj.2024.11.042. eCollection 2025.
2
Geometric Insights into the Multivariate Gaussian Distribution and Its Entropy and Mutual Information.多元高斯分布及其熵与互信息的几何见解
Entropy (Basel). 2023 Aug 7;25(8):1177. doi: 10.3390/e25081177.
3
MIA: Mutual Information Analyzer, a graphic user interface program that calculates entropy, vertical and horizontal mutual information of molecular sequence sets.MIA:互信息分析器,一个用于计算分子序列集的熵、垂直互信息和水平互信息的图形用户界面程序。
BMC Bioinformatics. 2015 Dec 10;16:409. doi: 10.1186/s12859-015-0837-0.
4
How to reduce dimension with PCA and random projections?如何使用主成分分析(PCA)和随机投影进行降维?
IEEE Trans Inf Theory. 2021 Dec;67(12):8154-8189. doi: 10.1109/tit.2021.3112821. Epub 2021 Sep 14.
5
PCA and multidimensional visualization techniques united to aid in the bioindication of elements from transplanted Sphagnum palustre moss exposed in the Gdańsk City area.主成分分析(PCA)和多维可视化技术相结合,有助于对格但斯克市地区暴露的移植泥炭藓中元素进行生物指示。
Environ Sci Pollut Res Int. 2008 Jan;15(1):41-50. doi: 10.1065/espr2007.05.422.
6
Nonlinear mapping technique for data visualization and clustering assessment of LIBS data: application to ChemCam data.用于 LIBS 数据可视化和聚类评估的非线性映射技术:在 ChemCam 数据中的应用。
Anal Bioanal Chem. 2011 Jul;400(10):3247-60. doi: 10.1007/s00216-011-4747-3. Epub 2011 Feb 18.
7
Entropic Statistical Description of Big Data Quality in Hotel Customer Relationship Management.酒店客户关系管理中大数据质量的熵统计描述
Entropy (Basel). 2019 Apr 19;21(4):419. doi: 10.3390/e21040419.
8
Sequential projection pursuit using genetic algorithms for data mining of analytical data.使用遗传算法的序贯投影寻踪用于分析数据的数据挖掘
Anal Chem. 2000 Jul 1;72(13):2846-55. doi: 10.1021/ac0000123.
9
An Informational Theoretical Approach to the Entropy of Liquids and Solutions.一种关于液体和溶液熵的信息理论方法。
Entropy (Basel). 2018 Jul 9;20(7):514. doi: 10.3390/e20070514.
10
Sequence alignment, mutual information, and dissimilarity measures for constructing phylogenies.构建系统发育树的序列比对、互信息和相似度度量。
PLoS One. 2011 Jan 4;6(1):e14373. doi: 10.1371/journal.pone.0014373.

引用本文的文献

1
DoliClock: a lipid-based aging clock reveals accelerated aging in neurological disorders.多利时钟:一种基于脂质的衰老时钟揭示了神经疾病中的加速衰老。
Aging (Albany NY). 2025 Jun 4;17(6):1405-1428. doi: 10.18632/aging.206266.

本文引用的文献

1
A Novel Information-Theory-Based Genetic Distance That Approximates Phenotypic Differences.一种基于信息论的新遗传距离,可近似表型差异。
J Comput Biol. 2023 Apr;30(4):420-431. doi: 10.1089/cmb.2022.0395. Epub 2023 Jan 3.
2
Principal Component Analyses (PCA)-based findings in population genetic studies are highly biased and must be reevaluated.基于主成分分析(PCA)的群体遗传学研究结果存在高度偏差,必须重新评估。
Sci Rep. 2022 Aug 29;12(1):14683. doi: 10.1038/s41598-022-14395-4.
3
Toward an Information Theory of Quantitative Genetics.
迈向数量遗传学的信息理论。
J Comput Biol. 2021 Jun;28(6):527-559. doi: 10.1089/cmb.2020.0032. Epub 2020 Dec 31.
4
Information Theory in Computational Biology: Where We Stand Today.计算生物学中的信息论:我们如今的现状
Entropy (Basel). 2020 Jun 6;22(6):627. doi: 10.3390/e22060627.
5
Decreased plasma phospholipid concentrations and increased acid sphingomyelinase activity are accurate biomarkers for community-acquired pneumonia.血浆磷脂浓度降低和酸性鞘磷脂酶活性增加是社区获得性肺炎的准确生物标志物。
J Transl Med. 2019 Nov 11;17(1):365. doi: 10.1186/s12967-019-2112-z.
6
Multidimensional Analysis Integrating Human T-Cell Signatures in Lymphatic Tissues with Sex of Humanized Mice for Prediction of Responses after Dendritic Cell Immunization.整合人源化小鼠淋巴组织中人类T细胞特征与性别进行多维度分析,以预测树突状细胞免疫后的反应。
Front Immunol. 2017 Dec 8;8:1709. doi: 10.3389/fimmu.2017.01709. eCollection 2017.
7
Information Theory Broadens the Spectrum of Molecular Ecology and Evolution.信息论拓宽了分子生态学和进化的研究范围。
Trends Ecol Evol. 2017 Dec;32(12):948-963. doi: 10.1016/j.tree.2017.09.012. Epub 2017 Nov 7.
8
A global reference for human genetic variation.人类遗传变异的全球参考。
Nature. 2015 Oct 1;526(7571):68-74. doi: 10.1038/nature15393.
9
Applying Shannon's information theory to bacterial and phage genomes and metagenomes.将香农信息论应用于细菌、噬菌体基因组和宏基因组。
Sci Rep. 2013;3:1033. doi: 10.1038/srep01033. Epub 2013 Jan 8.
10
A genealogical interpretation of principal components analysis.主成分分析的谱系学解释
PLoS Genet. 2009 Oct;5(10):e1000686. doi: 10.1371/journal.pgen.1000686. Epub 2009 Oct 16.