Suppr超能文献

Revealing gait as a murine biomarker of injury, disease, and age with multivariate statistics and machine learning.

作者信息

Naved Bilal A, Han Shuling, Koss Kyle M, Kando Mary J, Wang Jiao-Jing, Chang Jill, Weiss Craig, Passman Maya G, Wertheim Jason A, Luo Yuan, Zhang Zheng J

机构信息

Department of Biomedical Engineering, McCormick School of Engineering, Northwestern University, Evanston, IL, USA.

Comprehensive Transplant Center, Department of Surgery, Northwestern University Feinberg School of Medicine, Chicago, IL, USA.

出版信息

Sci Rep. 2025 Sep 29;15(1):33457. doi: 10.1038/s41598-025-02073-0.

Abstract

Hundreds of rodent gait studies have been published over the past two decades, according to a PubMed search. Treadmill gait data, for example from the DigiGait system, generates over 30 + spatial and temporal measures. Despite this multi-dimensional data, all but a handful of the published literature on rodent gait has conducted univariate analysis that reveals limited information on the relationships that are characteristic of different gait states. This study conducted rigorous multivariate analysis in the form of sequential feature selection and factor analysis on gait data from a variety of gait deviations (due to injury i.e. peripheral nerve transection and transplantation, disease i.e. IUGR and hyperoxia, and age-related changes) and used machine learning to train a classifier to distinguish among and score different gait states. Treadmill gait data (DigiGait) of three different types of gait deviations were collected. Data were collected from B6 mice using the DigiGait system, with gait measurements taken at standardized treadmill speeds of 10, 17, and 24 cm/s over a period of 3-4 s per observation. Each mouse underwent at least two trials at each speed. Data were collected on B6 mice that were healthy and had various types of gait deficit due to: (a) a peripheral nerve injury model with increasing degrees of damage to the neuromusculoskeletal sequence of gait i.e. nerve transection, total hind limb transplantation, (b) a central nerve injury model of increasing degrees of damage to the motor regions responsible for gait i.e. IUGR, IUGR + hyperoxia, and (c) gait changes due to increasing age. Multivariate factor analysis (using MATLAB's factoran) and forward feature selection (with ten-fold cross-validation) were conducted to identify those features and factors most descriptive of each gait state for comparison. Various machine learning classifier models were trained with ten-fold cross-validation and evaluated (e.g. random forest, regression, discriminant analysis, support vector machine, and ensemble) in a 70 - 30 training-testing split for their accuracy, precision, recall, and F-score. The highest performing model was used to score each type of gait for direct comparison on a scale of -0.5 to 0.5. The score distributions were plotted on a histogram for direct comparisons of score populations among various gait states. Multivariate feature selection revealed that not all 30 + features were relevant to describing the gait states. Plotting misclassification error (MCE) as a function of number of features included revealed that there was a critical number of features (~ 16) that minimized MCE (0.17 via univariate feature selection vs. 0.12 via multivariate feature selection). Incorporating more than 16 features led MCE to increase linearly indicating overfitting. Relationships among the identified features were understood via factor analysis. The factor analysis results were consistent with the biological differences between the groups (e.g. total hind limb transplantation was distinguishable via features descriptive of the positioning of the paw in relation to the body while nerve transection injury alone was distinguishable via features descriptive of changes to fine motor movements). Across all gait states, there was significant conservation of features and factors. This suggests certain relationships may be fundamental to rodent gait analysis regardless of the gait pathology in question. The highest performing machine learning classifier model (ensemble) was able to distinguish between gait deficits with high performance (F-score, recall, precision, and accuracy all > 0.90). This included the ability to distinguish between peripheral vs. central gait deficit, between individual types of peripheral deficit, between individual types of central deficit, and between younger vs. older animals. Using the classifier to score individual animals and plot the scores by group revealed score distributions that were consistent with biological phenomena. For example, the multivariate gait score trends as a result of increasing central nerve injury were consistent with the trends of white matter volume loss in relevant motor regions of the brain as measured via MRI. Finally, the degrees of separation between multivariate gait scores were consistent with the degree of biological difference between gaits (e.g. central injury had greater separation from healthy vs. peripheral injury; older and younger animals had more moderate, yet still statistically significant, separation in scores vs. any of the injury / disease states did with each other). In conclusion, this study establishes a new methodology to quantify and evaluate gait deviations across a variety of different models. Its novelty is in using multivariate statistics to describe the features and factors that characterize gait states due to injury, disease, and age for use in machine learning model training. This includes statistically describing the differences in gait between diseases with vastly different etiologies of gait deficits (peripheral vs. central). In doing so the methodology's novelty includes accounting for relationships between groupings of features in model training; something that traditional univariate analysis is unable to do. It used multivariate statistics and machine learning to reveal gait as a quantifiable, preclinical biomarker of injury, disease, and age. It collapsed a multi-dimensional biological phenomena (gait) into a single score by encoding revealed biological relationships allowing for direct, quantifiable comparisons of function as it pertains to ambulation. It revealed how these multivariate gait scores can visualize biologically consistent separation and combined effects. Finally, we demonstrate the application of this methodology to already published univariate study that is representative of the hundreds of univariate treadmill gait analysis published over the last two decades. Thereby, opening the door to a new class of multivariate gait analyses that provides greater insight and value than the current state-of-the art.

摘要
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8273/12480750/62b8c67c4e5b/41598_2025_2073_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验