Health Professions, Manchester Metropolitan University, Manchester, UK
Centre for Research and Interdisciplinarity (CRI), Université Paris Descartes, Paris, Île-de-France, France.
BMJ Open. 2020 Mar 23;10(3):e034568. doi: 10.1136/bmjopen-2019-034568.
We conducted a systematic review assessing the reporting quality of studies validating models based on machine learning (ML) for clinical diagnosis, with a specific focus on the reporting of information concerning the participants on which the diagnostic task was evaluated on.
Medline Core Clinical Journals were searched for studies published between July 2015 and July 2018. Two reviewers independently screened the retrieved articles, a third reviewer resolved any discrepancies. An extraction list was developed from the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis guideline. Two reviewers independently extracted the data from the eligible articles. Third and fourth reviewers checked, verified the extracted data as well as resolved any discrepancies between the reviewers.
The search results yielded 161 papers, of which 28 conformed to the eligibility criteria. Detail of data source was reported in 24 of the 28 papers. For all of the papers, the set of patients on which the ML-based diagnostic system was evaluated was partitioned from a larger dataset, and the method for deriving such set was always reported. Information on the diagnostic/non-diagnostic classification was reported well (23/28). The least reported items were the use of reporting guideline (0/28), distribution of disease severity (8/28 patient flow diagram (10/28) and distribution of alternative diagnosis (10/28). A large proportion of studies (23/28) had a delay between the conduct of the reference standard and ML tests, while one study did not and four studies were unclear. For 15 studies, it was unclear whether the evaluation group corresponded to the setting in which the ML test will be applied to.
All studies in this review failed to use reporting guidelines, and a large proportion of them lacked adequate detail on participants, making it difficult to replicate, assess and interpret study findings.
CRD42018099167.
我们进行了一项系统评价,评估了基于机器学习(ML)的临床诊断模型验证研究的报告质量,特别关注评估诊断任务所依据的参与者信息的报告情况。
检索了 2015 年 7 月至 2018 年 7 月发表的研究。两名审查员独立筛选检索到的文章,第三名审查员解决任何分歧。从 Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis 指南中制定了一个提取清单。两名审查员独立从合格文章中提取数据。第三和第四名审查员检查、验证提取的数据以及解决审查员之间的任何分歧。
搜索结果产生了 161 篇论文,其中 28 篇符合入选标准。28 篇论文中有 24 篇报告了数据来源的详细信息。对于所有的论文,基于 ML 的诊断系统评估的患者集合是从更大的数据集分离出来的,并且总是报告了这种集合的推导方法。关于诊断/非诊断分类的信息报告得很好(23/28)。报告最少的项目是使用报告指南(0/28)、疾病严重程度分布(8/28 患者流程图(10/28)和替代诊断分布(10/28)。很大一部分研究(23/28)在参考标准和 ML 测试之间存在延迟,而一项研究没有延迟,四项研究不清楚。对于 15 项研究,不清楚评估组是否对应于 ML 测试将应用的环境。
本综述中的所有研究都没有使用报告指南,其中很大一部分研究缺乏参与者的充分细节,使得难以复制、评估和解释研究结果。
PROSPERO 注册号:CRD42018099167。