• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

对距离度量的信息内容进行排序。

Ranking the information content of distance measures.

作者信息

Glielmo Aldo, Zeni Claudio, Cheng Bingqing, Csányi Gábor, Laio Alessandro

机构信息

Physics Department, International School for Advanced Studies (SISSA), Via Bonomea 265, 34136 Trieste, Italy.

Bank of Italy, 00187, Italy.

出版信息

PNAS Nexus. 2022 Apr 14;1(2):pgac039. doi: 10.1093/pnasnexus/pgac039. eCollection 2022 May.

DOI:10.1093/pnasnexus/pgac039
PMID:36713323
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9802303/
Abstract

Real-world data typically contain a large number of features that are often heterogeneous in nature, relevance, and also units of measure. When assessing the similarity between data points, one can build various distance measures using subsets of these features. Finding a small set of features that still retains sufficient information about the dataset is important for the successful application of many statistical learning approaches. We introduce a statistical test that can assess the relative information retained when using 2 different distance measures, and determine if they are equivalent, independent, or if one is more informative than the other. This ranking can in turn be used to identify the most informative distance measure and, therefore, the most informative set of features, out of a pool of candidates. To illustrate the general applicability of our approach, we show that it reproduces the known importance ranking of policy variables for Covid-19 control, and also identifies compact yet informative descriptors for atomic structures. We further provide initial evidence that the information asymmetry measured by the proposed test can be used to infer relationships of causality between the features of a dataset. The method is general and should be applicable to many branches of science.

摘要

现实世界的数据通常包含大量特征,这些特征在性质、相关性以及度量单位方面往往是异质的。在评估数据点之间的相似性时,可以使用这些特征的子集构建各种距离度量。找到一小组仍然保留有关数据集足够信息的特征对于许多统计学习方法的成功应用至关重要。我们引入一种统计检验,该检验可以评估使用两种不同距离度量时保留的相对信息,并确定它们是否等效、独立,或者其中一种是否比另一种更具信息性。这种排序反过来可用于从一组候选特征中识别出最具信息性的距离度量,从而确定最具信息性的特征集。为了说明我们方法的普遍适用性,我们表明它再现了政策变量对新冠疫情控制的已知重要性排名,并且还识别出了原子结构紧凑但信息丰富的描述符。我们进一步提供了初步证据,表明通过所提出的检验测量的信息不对称可用于推断数据集特征之间的因果关系。该方法具有通用性,应该适用于许多科学分支。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0044/9802303/d8a1007f4f1b/pgac039fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0044/9802303/432a9d8316e0/pgac039fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0044/9802303/520d9c155def/pgac039fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0044/9802303/a0948cc618ca/pgac039fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0044/9802303/d8a1007f4f1b/pgac039fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0044/9802303/432a9d8316e0/pgac039fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0044/9802303/520d9c155def/pgac039fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0044/9802303/a0948cc618ca/pgac039fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0044/9802303/d8a1007f4f1b/pgac039fig4.jpg

相似文献

1
Ranking the information content of distance measures.对距离度量的信息内容进行排序。
PNAS Nexus. 2022 Apr 14;1(2):pgac039. doi: 10.1093/pnasnexus/pgac039. eCollection 2022 May.
2
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
3
Small class sizes for improving student achievement in primary and secondary schools: a systematic review.小班教学对提高中小学学生成绩的影响:一项系统综述。
Campbell Syst Rev. 2018 Oct 11;14(1):1-107. doi: 10.4073/csr.2018.10. eCollection 2018.
4
Technology of Informative Feature Selection for Immunosignature Analysis.免疫特征分析信息特征选择技术。
Sovrem Tekhnologii Med. 2021;12(5):19-25. doi: 10.17691/stm2020.12.5.02. Epub 2020 Oct 28.
5
Impact of summer programmes on the outcomes of disadvantaged or 'at risk' young people: A systematic review.暑期项目对处境不利或“有风险”的年轻人的影响:一项系统综述。
Campbell Syst Rev. 2024 Jun 13;20(2):e1406. doi: 10.1002/cl2.1406. eCollection 2024 Jun.
6
Very Important Pool (VIP) genes--an application for microarray-based molecular signatures.非常重要的基因池(VIP)基因——基于微阵列的分子特征的一种应用。
BMC Bioinformatics. 2008 Aug 12;9 Suppl 9(Suppl 9):S9. doi: 10.1186/1471-2105-9-S9-S9.
7
Part 1. Statistical Learning Methods for the Effects of Multiple Air Pollution Constituents.第1部分. 多种空气污染成分影响的统计学习方法
Res Rep Health Eff Inst. 2015 Jun(183 Pt 1-2):5-50.
8
Planning Implications Related to Sterilization-Sensitive Science Investigations Associated with Mars Sample Return (MSR).与火星样本返回(MSR)相关的对灭菌敏感的科学研究的规划意义。
Astrobiology. 2022 Jun;22(S1):S112-S164. doi: 10.1089/AST.2021.0113. Epub 2022 May 19.
9
Identifying (Quasi) Equally Informative Subsets in Feature Selection Problems for Classification: A Max-Relevance Min-Redundancy Approach.在分类的特征选择问题中识别(准)等信息量子集:一种最大相关性最小冗余方法。
IEEE Trans Cybern. 2016 Jun;46(6):1424-37. doi: 10.1109/TCYB.2015.2444435. Epub 2015 Jul 6.
10
Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations?为什么田本系数是基于指纹的相似性计算的合适选择?
J Cheminform. 2015 May 20;7:20. doi: 10.1186/s13321-015-0069-3. eCollection 2015.

引用本文的文献

1
Classification and spatiotemporal correlation of dominant fluctuations in complex dynamical systems.复杂动力系统中主导涨落的分类及时空相关性
PNAS Nexus. 2025 Feb 7;4(2):pgaf038. doi: 10.1093/pnasnexus/pgaf038. eCollection 2025 Feb.
2
Predicting hydrogen atom transfer energy barriers using Gaussian process regression.使用高斯过程回归预测氢原子转移能垒。
Digit Discov. 2025 Jan 10;4(2):513-522. doi: 10.1039/d4dd00174e. eCollection 2025 Feb 12.
3
Automatic feature selection and weighting in molecular systems using Differentiable Information Imbalance.

本文引用的文献

1
DADApy: Distance-based analysis of data-manifolds in Python.DADApy:Python 中基于距离的数据流形分析。
Patterns (N Y). 2022 Sep 19;3(10):100589. doi: 10.1016/j.patter.2022.100589. eCollection 2022 Oct 14.
2
Text Data Augmentation for Deep Learning.用于深度学习的文本数据增强
J Big Data. 2021;8(1):101. doi: 10.1186/s40537-021-00492-0. Epub 2021 Jul 19.
3
The effect of interventions on COVID-19.干预措施对新型冠状病毒肺炎的影响。
使用可微信息不平衡在分子系统中进行自动特征选择和加权。
Nat Commun. 2025 Jan 2;16(1):270. doi: 10.1038/s41467-024-55449-7.
4
Maximally informative feature selection using Information Imbalance: Application to COVID-19 severity prediction.基于信息不平衡的最大信息量特征选择:在 COVID-19 严重程度预测中的应用。
Sci Rep. 2024 May 10;14(1):10744. doi: 10.1038/s41598-024-61334-6.
5
Robust inference of causality in high-dimensional dynamical processes from the Information Imbalance of distance ranks.基于距离秩的信息不平衡对高维动态过程中的因果关系进行稳健推断。
Proc Natl Acad Sci U S A. 2024 May 7;121(19):e2317256121. doi: 10.1073/pnas.2317256121. Epub 2024 Apr 30.
6
Quality assessment and community detection methods for anonymized mobility data in the Italian Covid context.意大利新冠疫情下匿名移动数据的质量评估和社区检测方法。
Sci Rep. 2024 Feb 26;14(1):4636. doi: 10.1038/s41598-024-54878-0.
7
Radiomics and machine learning applied to STIR sequence for prediction of quantitative parameters in facioscapulohumeral disease.将放射组学和机器学习应用于短TI反转恢复(STIR)序列,以预测面肩肱型肌营养不良症的定量参数。
Front Neurol. 2023 Feb 24;14:1105276. doi: 10.3389/fneur.2023.1105276. eCollection 2023.
8
DADApy: Distance-based analysis of data-manifolds in Python.DADApy:Python 中基于距离的数据流形分析。
Patterns (N Y). 2022 Sep 19;3(10):100589. doi: 10.1016/j.patter.2022.100589. eCollection 2022 Oct 14.
Nature. 2020 Dec;588(7839):E26-E28. doi: 10.1038/s41586-020-3025-y. Epub 2020 Dec 23.
4
Inferring the effectiveness of government interventions against COVID-19.推断政府干预 COVID-19 的效果。
Science. 2021 Feb 19;371(6531). doi: 10.1126/science.abd9338. Epub 2020 Dec 15.
5
Ranking the effectiveness of worldwide COVID-19 government interventions.对全球 COVID-19 政府干预措施的效果进行排名。
Nat Hum Behav. 2020 Dec;4(12):1303-1312. doi: 10.1038/s41562-020-01009-0. Epub 2020 Nov 16.
6
Estimating the effects of non-pharmaceutical interventions on COVID-19 in Europe.估算非药物干预措施对欧洲 COVID-19 疫情的影响。
Nature. 2020 Aug;584(7820):257-261. doi: 10.1038/s41586-020-2405-7. Epub 2020 Jun 8.
7
The effect of large-scale anti-contagion policies on the COVID-19 pandemic.大规模防疫政策对 COVID-19 大流行的影响。
Nature. 2020 Aug;584(7820):262-267. doi: 10.1038/s41586-020-2404-8. Epub 2020 Jun 8.
8
Predicting Materials Properties with Little Data Using Shotgun Transfer Learning.利用散弹枪迁移学习以少量数据预测材料属性
ACS Cent Sci. 2019 Oct 23;5(10):1717-1730. doi: 10.1021/acscentsci.9b00804. Epub 2019 Sep 30.
9
Inferring causation from time series in Earth system sciences.从地球系统科学中的时间序列推断因果关系。
Nat Commun. 2019 Jun 14;10(1):2553. doi: 10.1038/s41467-019-10105-3.
10
Information estimation using nonparametric copulas.使用非参数copulas进行信息估计。
Phys Rev E. 2018 Nov;98(5). doi: 10.1103/PhysRevE.98.053302. Epub 2018 Nov 5.