Suppr超能文献

在大型数据集 中检测新的关联。

Detecting novel associations in large data sets.

机构信息

Department of Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.

出版信息

Science. 2011 Dec 16;334(6062):1518-24. doi: 10.1126/science.1205438.

Abstract

Identifying interesting relationships between pairs of variables in large data sets is increasingly important. Here, we present a measure of dependence for two-variable relationships: the maximal information coefficient (MIC). MIC captures a wide range of associations both functional and not, and for functional relationships provides a score that roughly equals the coefficient of determination (R(2)) of the data relative to the regression function. MIC belongs to a larger class of maximal information-based nonparametric exploration (MINE) statistics for identifying and classifying relationships. We apply MIC and MINE to data sets in global health, gene expression, major-league baseball, and the human gut microbiota and identify known and novel relationships.

摘要

在大型数据集之间识别变量对之间有趣的关系变得越来越重要。在这里,我们提出了一种用于双变量关系的依赖度量:最大信息系数(MIC)。MIC 捕捉了广泛的关联,包括功能和非功能关系,对于功能关系,它提供了一个大致等于数据相对于回归函数的确定系数(R^2)的分数。MIC 属于一类更大的基于最大信息量的非参数探索(MINE)统计量,用于识别和分类关系。我们将 MIC 和 MINE 应用于全球健康、基因表达、大联盟棒球和人类肠道微生物组的数据集中,并识别出已知和新的关系。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c16/3325791/866bb476bc36/nihms358982f1.jpg

相似文献

1
Detecting novel associations in large data sets.在大型数据集 中检测新的关联。
Science. 2011 Dec 16;334(6062):1518-24. doi: 10.1126/science.1205438.
6
Detecting Unbiased Associations in Large Data Sets.在大数据集中检测无偏关联。
Big Data. 2022 Aug;10(4):337-355. doi: 10.1089/big.2021.0193. Epub 2021 Dec 20.
8
Analysis techniques for microarray time-series data.微阵列时间序列数据的分析技术
J Comput Biol. 2002;9(2):317-30. doi: 10.1089/10665270252935485.

引用本文的文献

6
Quantifying direct associations between variables.量化变量之间的直接关联。
Fundam Res. 2023 Aug 10;5(4):1538-1546. doi: 10.1016/j.fmre.2023.06.012. eCollection 2025 Jul.

本文引用的文献

1
Dealing with data. Challenges and opportunities. Introduction.数据处理。挑战与机遇。引言。
Science. 2011 Feb 11;331(6018):692-3. doi: 10.1126/science.331.6018.692.
3
How does multiple testing correction work?多重检验校正如何工作?
Nat Biotechnol. 2009 Dec;27(12):1135-7. doi: 10.1038/nbt1209-1135.
4
Evolution of mammals and their gut microbes.哺乳动物及其肠道微生物的进化。
Science. 2008 Jun 20;320(5883):1647-51. doi: 10.1126/science.1155725. Epub 2008 May 22.
5
The human microbiome project.人类微生物组计划
Nature. 2007 Oct 18;449(7164):804-10. doi: 10.1038/nature06244.
8
Human resources for health: overcoming the crisis.卫生人力资源:克服危机。
Lancet. 2004;364(9449):1984-90. doi: 10.1016/S0140-6736(04)17482-5.
9
Estimating mutual information.估计互信息。
Phys Rev E Stat Nonlin Soft Matter Phys. 2004 Jun;69(6 Pt 2):066138. doi: 10.1103/PhysRevE.69.066138. Epub 2004 Jun 23.
10
False discovery or missed discovery?假发现还是漏发现?
Heredity (Edinb). 2003 Dec;91(6):537-8. doi: 10.1038/sj.hdy.6800370.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验