Tabashum Thasina, Snyder Robert Cooper, O'Brien Megan K, Albert Mark V
Department of Computer Science and Engineering, University of North Texas, Denton, TX, United States.
Technology and Innovation Hub, Shirley Ryan AbilityLab, Chicago, IL, United States.
JMIR Med Inform. 2024 May 17;12:e50117. doi: 10.2196/50117.
With the increasing availability of data, computing resources, and easier-to-use software libraries, machine learning (ML) is increasingly used in disease detection and prediction, including for Parkinson disease (PD). Despite the large number of studies published every year, very few ML systems have been adopted for real-world use. In particular, a lack of external validity may result in poor performance of these systems in clinical practice. Additional methodological issues in ML design and reporting can also hinder clinical adoption, even for applications that would benefit from such data-driven systems.
To sample the current ML practices in PD applications, we conducted a systematic review of studies published in 2020 and 2021 that used ML models to diagnose PD or track PD progression.
We conducted a systematic literature review in accordance with PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines in PubMed between January 2020 and April 2021, using the following exact string: "Parkinson's" AND ("ML" OR "prediction" OR "classification" OR "detection" or "artificial intelligence" OR "AI"). The search resulted in 1085 publications. After a search query and review, we found 113 publications that used ML for the classification or regression-based prediction of PD or PD-related symptoms.
Only 65.5% (74/113) of studies used a holdout test set to avoid potentially inflated accuracies, and approximately half (25/46, 54%) of the studies without a holdout test set did not state this as a potential concern. Surprisingly, 38.9% (44/113) of studies did not report on how or if models were tuned, and an additional 27.4% (31/113) used ad hoc model tuning, which is generally frowned upon in ML model optimization. Only 15% (17/113) of studies performed direct comparisons of results with other models, severely limiting the interpretation of results.
This review highlights the notable limitations of current ML systems and techniques that may contribute to a gap between reported performance in research and the real-life applicability of ML models aiming to detect and predict diseases such as PD.
随着数据的日益丰富、计算资源的增加以及软件库使用的日益便捷,机器学习(ML)在疾病检测和预测中的应用越来越广泛,包括帕金森病(PD)。尽管每年发表的研究数量众多,但很少有ML系统被实际应用。特别是,缺乏外部有效性可能导致这些系统在临床实践中表现不佳。ML设计和报告中的其他方法问题也可能阻碍临床应用,即使是对于那些将从这种数据驱动系统中受益的应用。
为了抽样研究当前ML在PD应用中的实践情况,我们对2020年和2021年发表的使用ML模型诊断PD或追踪PD进展的研究进行了系统综述。
我们按照PRISMA(系统评价和Meta分析的首选报告项目)指南,于2020年1月至2021年4月在PubMed中进行了系统文献综述,使用以下精确检索词:“帕金森氏症” AND(“ML” 或 “预测” 或 “分类” 或 “检测” 或 “人工智能” 或 “AI”)。检索结果为1085篇出版物。经过检索查询和筛选,我们发现113篇使用ML对PD或PD相关症状进行基于分类或回归的预测的出版物。
只有65.5%(74/113)的研究使用了保留测试集以避免潜在的过高准确性,并且在没有保留测试集的研究中,约一半(25/46,54%)没有将此作为一个潜在问题提及。令人惊讶的是,38.9%(44/113)的研究没有报告模型是如何调整的或是否进行了调整,另外27.4%(31/113)使用了临时模型调整,这在ML模型优化中通常是不被认可的。只有15%(17/113)的研究将结果与其他模型进行了直接比较,这严重限制了结果的解释。
本综述突出了当前ML系统和技术的显著局限性,这些局限性可能导致研究报告的性能与旨在检测和预测诸如PD等疾病的ML模型的实际适用性之间存在差距。