Research and Analytics, Collective Health, San Francisco, CA.
Center for Primary Care, Harvard Medical School, Boston, MA.
Ethn Dis. 2020 Apr 2;30(Suppl 1):217-228. doi: 10.18865/ed.30.S1.217. eCollection 2020.
Precision medicine research designed to reduce health disparities often involves studying multi-level datasets to understand how diseases manifest disproportionately in one group over another, and how scarce health care resources can be directed precisely to those most at risk for disease. In this article, we provide a structured tutorial for medical and public health researchers on the application of machine learning methods to conduct precision medicine research designed to reduce health disparities. We review key terms and concepts for understanding machine learning papers, including supervised and unsupervised learning, regularization, cross-validation, bagging, and boosting. Metrics are reviewed for evaluating machine learners and major families of learning approaches, including tree-based learning, deep learning, and ensemble learning. We highlight the advantages and disadvantages of different learning approaches, describe strategies for interpreting "black box" models, and demonstrate the application of common methods in an example dataset with open-source statistical code in R.
精准医学研究旨在减少健康差异,通常涉及研究多层次数据集,以了解疾病如何在一个群体中不成比例地表现出来,以及如何将稀缺的医疗资源精确地指向那些最有患病风险的人。在本文中,我们为医学和公共卫生研究人员提供了一个关于应用机器学习方法进行旨在减少健康差异的精准医学研究的结构化教程。我们回顾了理解机器学习论文的关键术语和概念,包括有监督和无监督学习、正则化、交叉验证、装袋和提升。我们还回顾了用于评估机器学习器和主要学习方法家族的指标,包括基于树的学习、深度学习和集成学习。我们强调了不同学习方法的优缺点,描述了解释“黑盒”模型的策略,并在 R 中的示例数据集和开源统计代码中演示了常见方法的应用。