The revolution in digitizing medical data, along with high throughput biology, has resulted in unprecedented numbers of potential prognostic variables for health outcomes. Recent research suggests that clinical data can be used for continually updated accurate prognostics to optimize medical interventions. We address gaps in methodology available for estimating causal impacts in acute trauma settings using machine learning, and tailor this methodological research to the goals of using complex clinical data to analyze potential outcomes in acute trauma patients.
Our primary aims were the following: 1. Develop diagnostic (prediction) scores via SuperLearner for each of the trauma studies. Estimate the relative prediction accuracy of competing algorithms via cross-validation tools, such as cross-validated area under the curve. 2. Develop and evaluate variable importance estimators, which can potentially quantify the differentiable impact of treatments within covariate groups. 3. Develop and evaluate algorithms and software for deriving optimal patient rules using machine learning methods. We had several supplemental aims: 1. Create new R packages for analysis functions developed for the above-listed aims. 2. Develop data-adaptive parameter statistical framework, providing estimators and methods for deriving inference for a very general category of estimators that rely on data-derived parameters. 3. Develop a data-adaptive parameter for variable importance measures. 4. Derive method for estimating the variance in treatment effects across covariate groups.
Our statistical methodology is based on 3 developments: (1) SuperLearning for optimizing prediction, (2) causal structural models to define targeted parameters, and (3) targeted maximum likelihood estimation (TMLE). Research in statistical methodology might be ordered as the following process: parameter (question) Þ theory Þ estimator Þ evaluation in finite samples (simulations) Þ software Þ applied in empirical analyses. For some of our approaches, we started at the end (empirical analysis of existing data) and for others we had to start with a new theoretical framework (eg, data-adaptive parameter estimation). We used the Targeted Learning framework in the methods development and performed limited simulations to examine where the finite sample performance departs from the asymptotic behavior of the estimators. We also employed the estimators on several data sets (Activation of Coagulation and Inflammation in Trauma program, n = 467; Prospective Observational Multicenter Major Trauma Transfusion study, n = 977; Pragmatic, Randomized Optimal Platelet and Plasma Ratios trial, n = 660) as another check for their performance and modifications necessary to handle messy data.
Software for each of the goals has been created with corresponding open source R packages available for download. We developed general and parallelizable software for cross-validation (origami) procedures, implementation of data-adaptive development of treatment rules with relevant intervention parameters (opttx), and data-adaptive variable importance measures (varImpact). In addition to using the data analyses to demonstrate the practical performance of these new methods, we also used them to demonstrate the potential for existing, but not widely used, tools for prediction (SuperLearning) and variable importance (TMLE). Finally, we developed and estimated a new parameter related to treatment impact heterogeneity, creating a statistic summarizing the potential for improved health with more precisely tuned treatment regimens.
The results show great promise for incorporating health record data into individualized trauma patient care as well as well as into more generalized trauma care settings.
Given that the data collection in the studies used for this research was somewhat incomplete, these algorithms will require modifications in the future to handle densely sampled time-structured data (eg, streaming vital sign data).