Parozzi Mauro, Bozzetti Mattia, Lo Cascio Alessio, Napolitano Daniele, Pendoni Roberta, Marcomini Ilaria, Cangelosi Giovanni, Mancin Stefano, Bonacaro Antonio
Medicine and Surgery Department, University of Parma, Via Gramsci 14, 43126 Parma, Italy.
Direction of Health Professions, ASST Cremona, 26100 Cremona, Italy.
Nurs Rep. 2025 Jun 11;15(6):211. doi: 10.3390/nursrep15060211.
: The use of standardized assessment tools within the nursing care process is a globally established practice, widely recognized as a foundation for evidence-based evaluation. Accurate translation is essential to ensure their correct and consistent clinical use. While effective, traditional procedures are time-consuming and resource-intensive, leading to increasing interest in whether artificial intelligence can assist or streamline this process for nursing researchers. Therefore, this study aimed to assess the translation's quality of nursing assessment scales performed by ChatGPT 4.0. : A total of 31 nursing rating scales with 772 items were translated from English to Italian using two different prompts, and then underwent a deep lexicometric analysis. To assess the semantic accuracy of the translations the Sentence-BERT, Jaccard similarity, TF-IDF cosine similarity, and Overlap ratio were used. Sensitivity, specificity, AUC, and AUROC were calculated to assess the quality of the translation classification. Paired-sample -tests were conducted to compare the similarity scores. : The Maastricht prompt produced translations that are marginally but consistently more semantically and lexically faithful to the original. While all differences were found to be statistically significant, the corresponding effect sizes indicate that the advantage of the Maastricht prompt is slight but consistent across all measures. The sensitivity of the prompts was 0.929 (92.9%) for York and 0.932 (93.2%) for Maastricht. Specificity and precision remained for both at 1.000. : Findings highlight the potential of prompt engineering as a low-cost, effective method to enhance translation outcomes. Nonetheless, as translation represents only a preliminary step in the full validation process, further studies should investigate the integration of AI-assisted translation within the broader framework of instrument adaptation and validation.
在护理过程中使用标准化评估工具是一种全球公认的做法,被广泛视为循证评估的基础。准确翻译对于确保其在临床中的正确和一致使用至关重要。虽然传统方法有效,但耗时且资源密集,这使得人们越来越关注人工智能是否可以协助或简化护理研究人员的这一过程。因此,本研究旨在评估ChatGPT 4.0对护理评估量表的翻译质量。
使用两种不同的提示将总共31个护理评定量表(共772个项目)从英语翻译成意大利语,然后进行深入的词汇分析。为了评估翻译的语义准确性,使用了句子BERT、杰卡德相似度、TF-IDF余弦相似度和重叠率。计算敏感性、特异性、AUC和AUROC以评估翻译分类的质量。进行配对样本检验以比较相似度得分。
马斯特里赫特提示生成的翻译在语义和词汇上对原文的忠实度略高且较为一致。虽然所有差异均具有统计学意义,但相应的效应大小表明,马斯特里赫特提示的优势微小但在所有指标上均保持一致。约克提示的敏感性为0.929(92.9%),马斯特里赫特提示的敏感性为0.932(93.2%)。两者的特异性和精确性均为1.000。
研究结果凸显了提示工程作为一种低成本、有效方法来提高翻译效果的潜力。尽管如此,由于翻译只是完整验证过程的初步步骤,进一步的研究应探讨在工具改编和验证的更广泛框架内整合人工智能辅助翻译的情况。