利用日本匿名生活普查数据进行可解释的机器学习分析以识别糖尿病风险因素。

Interpretable machine learning analysis to identify risk factors for diabetes using the anonymous living census data of Japan.

作者信息

Jiang Pei, Suzuki Hiroyuki, Obi Takashi

机构信息

Course of Information and Communication, Department of Engineer, Tokyo Institute of Technology, Kanagawa, Japan.

Present Address: 4259 Nagatsutachou, Midori Ward, Yokohama, Kanagawa, 226-0026 Japan.

出版信息

Health Technol (Berl). 2023;13(1):119-131. doi: 10.1007/s12553-023-00730-w. Epub 2023 Jan 26.

DOI:10.1007/s12553-023-00730-w

PMID:36718178

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9876749/

Abstract

PURPOSE

Diabetes mellitus causes various problems in our life. With the big data boom in our society, some risk factors for Diabetes must still exist. To identify new risk factors for diabetes in the big data society and explore further efficient use of big data, the non-objective-oriented census data about the Japanese Citizen's Survey of Living Conditions were analyzed using interpretable machine learning methods.

METHODS

Seven interpretable machine learning methods were used to analysis Japan citizens' census data. Firstly, logistic analysis was used to analyze the risk factors of diabetes from 19 selected initial elements. Then, the linear analysis, linear discriminate analysis, Hayashi's quantification analysis method 2, random forest, XGBoost, and SHAP methods were used to re-check and find the different factor contributions. Finally, the relationship among the factors was analyzed to understand the relationship among factors.

RESULTS

Four new risk factors: the number of family members, insurance type, public pension type, and health awareness level, were found as risk factors for diabetes mellitus for the first time, while another 11 risk factors were reconfirmed in this analysis. Especially the insurance type factor and health awareness level factor make more contributions to diabetes than factors: hypertension, hyperlipidemia, and stress in some interpretable models. We also found that work years were identified as a risk factor for diabetes because it has a high coefficient with the risk factor of age.

CONCLUSIONS

New risk factors for diabetes mellitus were identified based on Japan's non-objective-oriented anonymous census data using interpretable machine learning models. The newly identified risk factors inspire new possible policies for preventing diabetes. Moreover, our analysis certifies that big data can help us find helpful knowledge in today's prosperous society. Our study also paves the way for identifying more risk factors and promoting the efficiency of using big data.

摘要

目的

糖尿病在我们的生活中引发了各种问题。随着社会大数据热潮的兴起，糖尿病的一些风险因素想必依然存在。为了在大数据社会中识别糖尿病的新风险因素，并探索进一步有效利用大数据的方法，我们使用可解释的机器学习方法，对日本公民生活状况调查的非目标导向型普查数据进行了分析。

方法

使用七种可解释的机器学习方法来分析日本公民的普查数据。首先，采用逻辑分析从19个选定的初始因素中分析糖尿病的风险因素。然后，使用线性分析、线性判别分析、林氏量化分析方法2、随机森林、XGBoost和SHAP方法进行重新检查，并找出不同因素的贡献。最后，分析各因素之间的关系，以了解因素之间的关联。

结果

首次发现四个新的风险因素：家庭成员数量、保险类型、公共养老金类型和健康意识水平为糖尿病的风险因素，同时在本次分析中再次确认了另外11个风险因素。特别是在一些可解释模型中，保险类型因素和健康意识水平因素对糖尿病的影响比高血压、高脂血症和压力等因素更大。我们还发现工作年限被确定为糖尿病的一个风险因素，因为它与年龄风险因素的系数较高。

结论

利用可解释的机器学习模型，基于日本非目标导向型匿名普查数据识别出了糖尿病的新风险因素。新发现的风险因素为预防糖尿病激发了新的可能政策。此外，我们的分析证明，大数据能够帮助我们在当今繁荣的社会中找到有用的知识。我们的研究也为识别更多风险因素以及提高大数据使用效率铺平了道路。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

利用日本匿名生活普查数据进行可解释的机器学习分析以识别糖尿病风险因素。

Interpretable machine learning analysis to identify risk factors for diabetes using the anonymous living census data of Japan.

作者信息

机构信息

出版信息

PURPOSE

METHODS

RESULTS

CONCLUSIONS

目的

方法

结果

结论

相似文献

本文引用的文献

利用日本匿名生活普查数据进行可解释的机器学习分析以识别糖尿病风险因素。

Interpretable machine learning analysis to identify risk factors for diabetes using the anonymous living census data of Japan.

作者信息

机构信息

出版信息

PURPOSE

METHODS

RESULTS

CONCLUSIONS

目的

方法

结果

结论

相似文献

本文引用的文献