Clinical Epidemiology Program, Ottawa Hospital Research Institute, Ottawa, ON, Canada.
Department of Anesthesiology and Pain Medicine, The Ottawa Hospital, Ottawa, ON, Canada.
Health Info Libr J. 2024 Jun;41(2):136-148. doi: 10.1111/hir.12413. Epub 2021 Nov 18.
Artificial intelligence (AI) offers a promising solution to expedite various phases of the systematic review process such as screening.
We aimed to assess the accuracy of an AI tool in identifying eligible references for a systematic review compared to identification by human assessors.
For the case study (a systematic review of knowledge translation interventions), we used a diagnostic accuracy design and independently assessed for eligibility a set of articles (n = 300) using human raters and the AI system DistillerAI (Evidence Partners, Ottawa, Canada). We analysed a series of 64 possible confidence levels for the AI's decisions and calculated several standard parameters of diagnostic accuracy for each.
When set to a lower AI confidence threshold of 0.1 or greater and an upper threshold of 0.9 or lower, DistillerAI made article selection decisions very similarly to human assessors. Within this range, DistillerAI made a decision on the majority of articles (93-100%), with a sensitivity of 1.0 and specificity ranging from 0.9 to 1.0.
DistillerAI appears to be accurate in its assessment of articles in a case study of 300 articles. Further experimentation with DistillerAI will establish its performance among other subject areas.
人工智能(AI)为加快系统评价过程的各个阶段(如筛选)提供了一个有前途的解决方案。
我们旨在评估 AI 工具在识别系统评价中合格文献方面的准确性,与人类评估员的识别进行比较。
对于案例研究(知识转化干预措施的系统评价),我们使用了诊断准确性设计,使用人类评估员和 AI 系统 DistillerAI(Evidence Partners,渥太华,加拿大)独立评估了一组文章(n=300)的合格性。我们分析了 AI 决策的 64 个可能置信度水平系列,并为每个水平计算了几个诊断准确性的标准参数。
当将 AI 置信度阈值设置为 0.1 或更高且上限阈值为 0.9 或更低时,DistillerAI 与人类评估员做出的文章选择决策非常相似。在这个范围内,DistillerAI 对大多数文章(93-100%)做出了决策,灵敏度为 1.0,特异性范围为 0.9 到 1.0。
在 300 篇文章的案例研究中,DistillerAI 似乎在评估文章方面是准确的。进一步使用 DistillerAI 进行实验将确定其在其他领域的性能。