Luo Gang
Department of Biomedical Informatics and Medical Education, University of Washington, UW Medicine South Lake Union, 850 Republican Street, Building C, Box 358047, Seattle, WA, 98109, USA
Glob Transit. 2019;1:61-82. doi: 10.1016/j.glt.2018.11.001. Epub 2019 Mar 27.
Predictive modeling based on machine learning with medical data has great potential to improve healthcare and reduce costs. However, two hurdles, among others, impede its widespread adoption in hdealthcare. First, medical data are by nature longitudinal. Pre-processing them, particularly for feature engineering, is labor intensive and often takes 50-80% of the model building effort. Predictive temporal features are the basis of building accurate models, but are difficult to identify. This is problematic. Healthcare systems have limited resources for model building, while inaccurate models produce sub-optimal outcomes and are often useless. Second, most machine learning models provide no explanation of their prediction results. However, offering such explanations is essential for a model to be used in usual clinical practice. To address these two hurdles, this paper outlines: 1) a data-driven method for semi-automatically extracting predictive and clinically meaningful temporal features from medical data for predictive modeling; and 2) a method of using these features to automatically explain machine learning prediction results and suggest tailored interventions. This provides a roadmap for future research.
基于机器学习和医学数据的预测建模在改善医疗保健和降低成本方面具有巨大潜力。然而,有两个障碍阻碍了它在医疗保健领域的广泛应用。首先,医学数据本质上是纵向的。对其进行预处理,特别是特征工程,劳动强度大,通常占模型构建工作的50%-80%。预测性时间特征是构建准确模型的基础,但难以识别。这是个问题。医疗保健系统用于模型构建的资源有限,而不准确的模型会产生次优结果,通常毫无用处。其次,大多数机器学习模型不解释其预测结果。然而,提供这样的解释对于模型在常规临床实践中的应用至关重要。为解决这两个障碍,本文概述了:1)一种数据驱动的方法,用于从医学数据中半自动提取预测性和临床有意义的时间特征以进行预测建模;2)一种使用这些特征自动解释机器学习预测结果并建议定制干预措施的方法。这为未来的研究提供了路线图。