Suppr超能文献

研究使用数据挖掘、预测算法以及通用交换和推理语言来分析社会经济健康数据。

Studies in the use of data mining, prediction algorithms, and a universal exchange and inference language in the analysis of socioeconomic health data.

机构信息

Ingine Inc. Virginia, USA and the Dirac Foundation OxfordShire, UK.

Ingine Inc. Virginia, USA and the Dirac Foundation OxfordShire, UK.

出版信息

Comput Biol Med. 2019 Sep;112:103369. doi: 10.1016/j.compbiomed.2019.103369. Epub 2019 Jul 25.

Abstract

While clinical and biomedical information in digital form has been escalating, it is socioeconomic factors that are important determinants of health on the national and global scale. We show how collective use of data mining and prediction algorithms to analyze socioeconomic population health data can stand beside classical correlation analysis in routine data analysis. The underlying theoretical basis is the Dirac notation and algebra that is a scientific standard but unusual outside of the physical sciences, combined with a theory of expected information first developed for analyzing sparse data but still largely confined to bioinformatics. The latter was important here because the records analyzed (which are for US counties and equivalents, not patients) are very few by contemporary data mining standards. The approach is very unlikely to be familiar to socioeconomic researchers, so the theory and the advantages of our inference nets over the Bayes Net are reviewed here, mostly using socioeconomic examples. While our expertise and focus is in regard to novel analytical methods rather than socioeconomics per se, a significant negative (countertrending) relationship between population health and equity was initially surprising, at least to the present authors. This encouraged deeper exploration including that of the relationship between our data mining methods and traditional Pearson's correlation. The latter is susceptible to giving wrong conclusions if a phenomenon called Simpson's paradox applies, so this is also investigated. Also discussed is that, even for very few records, associative data mining can still demand significant computational resources due to a combinatorial explosion.

摘要

虽然数字形式的临床和生物医学信息一直在增加,但在国家和全球范围内,决定健康的重要因素是社会经济因素。我们展示了如何集体使用数据挖掘和预测算法来分析社会经济人口健康数据,这些方法可以与常规数据分析中的经典相关分析并驾齐驱。其基本理论依据是狄拉克符号和代数,这是物理科学以外的科学标准,但并不常见,结合了一种最初为分析稀疏数据而开发的预期信息理论,但仍主要局限于生物信息学。后一种理论在这里很重要,因为所分析的记录(针对美国县及同等地区,而不是患者)按照当代数据挖掘标准来看非常少。这种方法对于社会经济研究人员来说很可能不熟悉,因此,这里回顾了我们的推理网络相对于贝叶斯网络的理论和优势,主要使用社会经济示例。虽然我们的专业知识和重点是新颖的分析方法,而不是社会经济学本身,但人口健康和公平之间存在显著的负相关(反向趋势)关系,这让我们感到惊讶,至少让目前的作者感到惊讶。这鼓励了更深入的探索,包括我们的数据挖掘方法与传统的皮尔逊相关分析之间的关系。如果存在所谓的辛普森悖论现象,后者很容易得出错误的结论,因此也对此进行了调查。还讨论了即使对于非常少的记录,由于组合爆炸,关联数据挖掘仍然可能需要大量的计算资源。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验