Am J Epidemiol. 2021 Aug 1;190(8):1476-1482. doi: 10.1093/aje/kwab047.
Machine learning is gaining prominence in the health sciences, where much of its use has focused on data-driven prediction. However, machine learning can also be embedded within causal analyses, potentially reducing biases arising from model misspecification. Using a question-and-answer format, we provide an introduction and orientation for epidemiologists interested in using machine learning but concerned about potential bias or loss of rigor due to use of "black box" models. We conclude with sample software code that may lower the barrier to entry to using these techniques.
机器学习在健康科学领域越来越受到关注,其应用主要集中在数据驱动的预测上。然而,机器学习也可以嵌入到因果分析中,从而减少由于模型不精确而产生的偏差。我们采用问答的形式,为对使用机器学习感兴趣但又担心由于使用“黑盒”模型而导致潜在偏差或严谨性损失的流行病学家提供介绍和指导。最后,我们提供了一些示例软件代码,这些代码可能会降低使用这些技术的门槛。