Daraqel Baraa, Owayda Amer, Khan Haris, Koletsi Despina, Mheissen Samer
Department of Orthodontics, Oral Health Research and Promotion Unit, Al-Quds University, Jerusalem, Palestine.
Private practice, Harmony medical group, Abu Dhabi, United Arab Emirates.
J Dent. 2025 Sep;160:105846. doi: 10.1016/j.jdent.2025.105846. Epub 2025 May 29.
Data extraction for systematic reviews is a time-consuming step and prone to errors.
This study aimed to evaluate the agreement between artificial intelligence and human data extraction methods.
Studies published in seven orthodontic journals between 2019 and 2024, were retrieved and included. Fifteen data sets from each study were extracted manually and using the Microsoft Bing AI-based tool by two independent reviewers. Files in Portable Document Format were uploaded to the AI-based tool, and specific data were requested through its chat feature. The association between the data extraction methods and study characteristics was examined, and agreement was evaluated using interclass correlation and Kappa statistics.
A total of 300 orthodontic studies were included. Slight differences between human and AI-based data extraction methods for publication years and study designs were observed, though these were not statistically significant. Minor inconsistencies were also found in the extraction of the number of trial arms and the mean age of participants per group, but these were not significant. The AI-based tool was less effective in extracting variables related to the study design (P = 0.017) and the number of centers (P < 0.001). Agreement between human and AI-based extraction methods ranged from slight (0.16) for the type of study design to moderate (0.45) for study design classification, and substantial to perfect (0.65-1.00) for most other variables.
AI-based data extraction, while effective for straightforward variables, is not fully reliable for complex data extraction. Human input remains essential for ensuring accuracy and completeness in systematic reviews.
AI-based tools can effectively extract straightforward data, potentially reducing the time and effort required for systematic reviews. This can help clinicians and researchers process large data more efficiently. However, it is important to keep human supervision to maintain the integrity and reliability of clinical evidence.
系统评价中的数据提取是一个耗时的步骤,且容易出错。
本研究旨在评估人工智能与人工数据提取方法之间的一致性。
检索并纳入了2019年至2024年期间在7种正畸学杂志上发表的研究。由两名独立的审阅者从每项研究中手动提取15个数据集,并使用基于微软必应人工智能的工具进行提取。将便携式文档格式的文件上传到基于人工智能的工具,并通过其聊天功能请求特定数据。检查了数据提取方法与研究特征之间的关联,并使用组内相关系数和卡帕统计量评估一致性。
共纳入300项正畸学研究。观察到人工与基于人工智能的数据提取方法在发表年份和研究设计方面存在细微差异,尽管这些差异无统计学意义。在试验组数量和每组参与者平均年龄的提取方面也发现了一些小的不一致,但这些也不显著。基于人工智能的工具在提取与研究设计相关的变量(P = 0.017)和中心数量(P < 0.001)方面效果较差。人工与基于人工智能的提取方法之间的一致性范围从研究设计类型的轻微一致性(0.16)到研究设计分类的中等一致性(0.45),而对于大多数其他变量则为实质性到完美一致性(0.65 - 1.00)。
基于人工智能的数据提取虽然对简单变量有效,但对于复杂数据提取并不完全可靠。人工输入对于确保系统评价的准确性和完整性仍然至关重要。
基于人工智能的工具可以有效地提取简单数据,可能减少系统评价所需的时间和精力。这有助于临床医生和研究人员更高效地处理大数据。然而,保持人工监督对于维持临床证据的完整性和可靠性很重要。