研究使用数据挖掘、预测算法以及通用交换和推理语言来分析社会经济健康数据。

Studies in the use of data mining, prediction algorithms, and a universal exchange and inference language in the analysis of socioeconomic health data.

机构信息

Ingine Inc. Virginia, USA and the Dirac Foundation OxfordShire, UK.

出版信息

Comput Biol Med. 2019 Sep;112:103369. doi: 10.1016/j.compbiomed.2019.103369. Epub 2019 Jul 25.

DOI:10.1016/j.compbiomed.2019.103369

Abstract

While clinical and biomedical information in digital form has been escalating, it is socioeconomic factors that are important determinants of health on the national and global scale. We show how collective use of data mining and prediction algorithms to analyze socioeconomic population health data can stand beside classical correlation analysis in routine data analysis. The underlying theoretical basis is the Dirac notation and algebra that is a scientific standard but unusual outside of the physical sciences, combined with a theory of expected information first developed for analyzing sparse data but still largely confined to bioinformatics. The latter was important here because the records analyzed (which are for US counties and equivalents, not patients) are very few by contemporary data mining standards. The approach is very unlikely to be familiar to socioeconomic researchers, so the theory and the advantages of our inference nets over the Bayes Net are reviewed here, mostly using socioeconomic examples. While our expertise and focus is in regard to novel analytical methods rather than socioeconomics per se, a significant negative (countertrending) relationship between population health and equity was initially surprising, at least to the present authors. This encouraged deeper exploration including that of the relationship between our data mining methods and traditional Pearson's correlation. The latter is susceptible to giving wrong conclusions if a phenomenon called Simpson's paradox applies, so this is also investigated. Also discussed is that, even for very few records, associative data mining can still demand significant computational resources due to a combinatorial explosion.

摘要

虽然数字形式的临床和生物医学信息一直在增加，但在国家和全球范围内，决定健康的重要因素是社会经济因素。我们展示了如何集体使用数据挖掘和预测算法来分析社会经济人口健康数据，这些方法可以与常规数据分析中的经典相关分析并驾齐驱。其基本理论依据是狄拉克符号和代数，这是物理科学以外的科学标准，但并不常见，结合了一种最初为分析稀疏数据而开发的预期信息理论，但仍主要局限于生物信息学。后一种理论在这里很重要，因为所分析的记录（针对美国县及同等地区，而不是患者）按照当代数据挖掘标准来看非常少。这种方法对于社会经济研究人员来说很可能不熟悉，因此，这里回顾了我们的推理网络相对于贝叶斯网络的理论和优势，主要使用社会经济示例。虽然我们的专业知识和重点是新颖的分析方法，而不是社会经济学本身，但人口健康和公平之间存在显著的负相关（反向趋势）关系，这让我们感到惊讶，至少让目前的作者感到惊讶。这鼓励了更深入的探索，包括我们的数据挖掘方法与传统的皮尔逊相关分析之间的关系。如果存在所谓的辛普森悖论现象，后者很容易得出错误的结论，因此也对此进行了调查。还讨论了即使对于非常少的记录，由于组合爆炸，关联数据挖掘仍然可能需要大量的计算资源。

相似文献

Studies in the use of data mining, prediction algorithms, and a universal exchange and inference language in the analysis of socioeconomic health data.研究使用数据挖掘、预测算法以及通用交换和推理语言来分析社会经济健康数据。

Comput Biol Med. 2019 Sep;112:103369. doi: 10.1016/j.compbiomed.2019.103369. Epub 2019 Jul 25.

Implementation of a web based universal exchange and inference language for medicine: Sparse data, probabilities and inference in data mining of clinical data repositories.基于网络的医学通用交换与推理语言的实现：临床数据存储库数据挖掘中的稀疏数据、概率与推理

Comput Biol Med. 2015 Nov 1;66:82-102. doi: 10.1016/j.compbiomed.2015.07.015. Epub 2015 Jul 28.

Extension of the Quantum Universal Exchange Language to precision medicine and drug lead discovery. Preliminary example studies using the mitochondrial genome.量子通用交换语言在精准医学和药物先导发现中的扩展。使用线粒体基因组的初步实例研究。

Comput Biol Med. 2020 Feb;117:103621. doi: 10.1016/j.compbiomed.2020.103621. Epub 2020 Jan 20.

Studies in the extensively automatic construction of large odds-based inference networks from structured data. Examples from medical, bioinformatics, and health insurance claims data.从结构化数据中广泛自动构建基于大odds 的推理网络的研究。来自医学、生物信息学和健康保险索赔数据的示例。

Comput Biol Med. 2018 Apr 1;95:147-166. doi: 10.1016/j.compbiomed.2018.02.013. Epub 2018 Mar 21.

Suggestions for a Web based universal exchange and inference language for medicine.医学用基于网络的通用交换和推理语言的建议。

Comput Biol Med. 2013 Dec;43(12):2297-310. doi: 10.1016/j.compbiomed.2013.09.010. Epub 2013 Sep 20.

Hyperbolic Dirac Nets for medical decision support. Theory, methods, and comparison with Bayes Nets.双曲型狄拉克网络在医疗决策支持中的应用。理论、方法及与贝叶斯网络的比较。

Comput Biol Med. 2014 Aug;51:183-97. doi: 10.1016/j.compbiomed.2014.03.014. Epub 2014 Apr 8.

A Survey of Data Mining and Deep Learning in Bioinformatics.生物信息学中的数据挖掘和深度学习调查。

J Med Syst. 2018 Jun 28;42(8):139. doi: 10.1007/s10916-018-1003-9.

Data-mining to build a knowledge representation store for clinical decision support. Studies on curation and validation based on machine performance in multiple choice medical licensing examinations.数据挖掘以构建用于临床决策支持的知识表示存储库。基于多项选择医学许可考试中的机器性能进行的策展和验证研究。

Comput Biol Med. 2016 Jun 1;73:71-93. doi: 10.1016/j.compbiomed.2016.02.010. Epub 2016 Feb 26.

Mapping chemical structure-activity information of HAART-drug cocktails over complex networks of AIDS epidemiology and socioeconomic data of U.S. counties.在美国各县艾滋病流行病学和社会经济数据的复杂网络上绘制高效抗逆转录病毒治疗药物鸡尾酒的化学结构-活性信息。

Biosystems. 2015 Jun;132-133:20-34. doi: 10.1016/j.biosystems.2015.04.007. Epub 2015 Apr 24.

Mining real-world high dimensional structured data in medicine and its use in decision support. Some different perspectives on unknowns, interdependency, and distinguishability.医学中真实世界高维结构化数据的挖掘及其在决策支持中的应用。未知、相互依存和可区分性的一些不同视角。

Comput Biol Med. 2022 Feb;141:105118. doi: 10.1016/j.compbiomed.2021.105118. Epub 2021 Dec 11.

引用本文的文献

Towards faster response against emerging epidemics and prediction of variants of concern.以更快应对新出现的流行病并预测关注的变异株。

Inform Med Unlocked. 2022;31:100966. doi: 10.1016/j.imu.2022.100966. Epub 2022 May 20.

A scoping review on the use of machine learning in research on social determinants of health: Trends and research prospects.关于机器学习在健康社会决定因素研究中的应用的范围综述：趋势与研究前景

SSM Popul Health. 2021 Jun 5;15:100836. doi: 10.1016/j.ssmph.2021.100836. eCollection 2021 Sep.

Computers and viral diseases. Preliminary bioinformatics studies on the design of a synthetic vaccine and a preventative peptidomimetic antagonist against the SARS-CoV-2 (2019-nCoV, COVID-19) coronavirus.计算机与病毒性疾病。针对 SARS-CoV-2（2019-nCoV，COVID-19）冠状病毒的合成疫苗和预防性肽模拟拮抗剂的设计的初步生物信息学研究。

Comput Biol Med. 2020 Apr;119:103670. doi: 10.1016/j.compbiomed.2020.103670. Epub 2020 Feb 26.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

研究使用数据挖掘、预测算法以及通用交换和推理语言来分析社会经济健康数据。

Studies in the use of data mining, prediction algorithms, and a universal exchange and inference language in the analysis of socioeconomic health data.

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献