Suppr超能文献

一项涉及数千名参赛者的公开竞赛未能为新的诊断测试准确性系统评价构建有用的抽象分类器。

An open competition involving thousands of competitors failed to construct useful abstract classifiers for new diagnostic test accuracy systematic reviews.

机构信息

Department of Internal Medicine, Kyoto Min-iren Asukai Hospital, Kyoto, Japan.

Scientific Research Works Peer Support Group (SRWS-PSG), Osaka, Japan.

出版信息

Res Synth Methods. 2023 Sep;14(5):707-717. doi: 10.1002/jrsm.1649. Epub 2023 Jun 20.

Abstract

There are currently no abstract classifiers, which can be used for new diagnostic test accuracy (DTA) systematic reviews to select primary DTA study abstracts from database searches. Our goal was to develop machine-learning-based abstract classifiers for new DTA systematic reviews through an open competition. We prepared a dataset of abstracts obtained through database searches from 11 reviews in different clinical areas. As the reference standard, we used the abstract lists that required manual full-text review. We randomly splitted the datasets into a train set, a public test set, and a private test set. Competition participants used the training set to develop classifiers and validated their classifiers using the public test set. The classifiers were refined based on the performance of the public test set. They could submit as many times as they wanted during the competition. Finally, we used the private test set to rank the submitted classifiers. To reduce false exclusions, we used the Fbeta measure with a beta set to seven for evaluating classifiers. After the competition, we conducted the external validation using a dataset from a cardiology DTA review. We received 13,774 submissions from 1429 teams or persons over 4 months. The top-honored classifier achieved a Fbeta score of 0.4036 and a recall of 0.2352 in the external validation. In conclusion, we were unable to develop an abstract classifier with sufficient recall for immediate application to new DTA systematic reviews. Further studies are needed to update and validate classifiers with datasets from other clinical areas.

摘要

目前没有抽象分类器可用于新的诊断测试准确性(DTA)系统评价,以从数据库搜索中选择主要的 DTA 研究摘要。我们的目标是通过公开竞赛开发基于机器学习的新 DTA 系统评价的摘要分类器。我们准备了一个数据集,其中包括通过来自 11 个不同临床领域的综述的数据库搜索获得的摘要。作为参考标准,我们使用需要手动全文审查的摘要列表。我们将数据集随机分为训练集、公共测试集和私有测试集。竞赛参与者使用训练集开发分类器,并使用公共测试集验证其分类器。根据公共测试集的性能对分类器进行了改进。他们可以在竞赛期间多次提交。最后,我们使用私有测试集对提交的分类器进行排名。为了减少错误排除,我们使用了 Fbeta 度量,其中 beta 设置为 7,用于评估分类器。竞赛结束后,我们使用来自心脏病学 DTA 综述的数据集进行了外部验证。在 4 个月的时间里,我们收到了来自 1429 个团队或个人的 13774 次提交。在外部验证中,排名最高的分类器获得了 0.4036 的 Fbeta 分数和 0.2352 的召回率。总之,我们无法开发出具有足够召回率的摘要分类器,以立即应用于新的 DTA 系统评价。需要进一步的研究来使用来自其他临床领域的数据集更新和验证分类器。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验