Sebo Paul
University of Geneva, Geneva, Switzerland.
J Med Internet Res. 2024 Dec 9;26:e57667. doi: 10.2196/57667.
In the realm of scientific research, peer review serves as a cornerstone for ensuring the quality and integrity of scholarly papers. Recent trends in promoting transparency and accountability has led some journals to publish peer-review reports alongside papers.
ChatGPT-4 (OpenAI) was used to quantitatively assess sentiment and politeness in peer-review reports from high-impact medical journals. The objective was to explore gender and geographical disparities to enhance inclusivity within the peer-review process.
All 9 general medical journals with an impact factor >2 that publish peer-review reports were identified. A total of 12 research papers per journal were randomly selected, all published in 2023. The names of the first and last authors along with the first author's country of affiliation were collected, and the gender of both the first and last authors was determined. For each review, ChatGPT-4 was asked to evaluate the "sentiment score," ranging from -100 (negative) to 0 (neutral) to +100 (positive), and the "politeness score," ranging from -100 (rude) to 0 (neutral) to +100 (polite). The measurements were repeated 5 times and the minimum and maximum values were removed. The mean sentiment and politeness scores for each review were computed and then summarized using the median and interquartile range. Statistical analyses included Wilcoxon rank-sum tests, Kruskal-Wallis rank tests, and negative binomial regressions.
Analysis of 291 peer-review reports corresponding to 108 papers unveiled notable regional disparities. Papers from the Middle East, Latin America, or Africa exhibited lower sentiment and politeness scores compared to those from North America, Europe, or Pacific and Asia (sentiment scores: 27 vs 60 and 62 respectively; politeness scores: 43.5 vs 67 and 65 respectively, adjusted P=.02). No significant differences based on authors' gender were observed (all P>.05).
Notable regional disparities were found, with papers from the Middle East, Latin America, and Africa demonstrating significantly lower scores, while no discernible differences were observed based on authors' gender. The absence of gender-based differences suggests that gender biases may not manifest as prominently as other forms of bias within the context of peer review. The study underscores the need for targeted interventions to address regional disparities in peer review and advocates for ongoing efforts to promote equity and inclusivity in scholarly communication.
在科学研究领域,同行评审是确保学术论文质量和完整性的基石。近期促进透明度和问责制的趋势促使一些期刊在发表论文的同时公布同行评审报告。
使用ChatGPT-4(OpenAI)对高影响力医学期刊的同行评审报告中的情感倾向和礼貌程度进行定量评估。目的是探索性别和地域差异,以提高同行评审过程中的包容性。
确定了所有9种影响因子>2且发表同行评审报告的综合医学期刊。每种期刊随机选择12篇研究论文,均于2023年发表。收集第一作者和最后作者的姓名以及第一作者的所属国家,并确定第一作者和最后作者的性别。对于每篇评审报告,要求ChatGPT-4评估“情感得分”,范围从-100(负面)到0(中性)再到+100(正面),以及“礼貌得分”,范围从-100(粗鲁)到0(中性)再到+100(礼貌)。测量重复5次,去除最小值和最大值。计算每次评审的平均情感得分和礼貌得分,然后使用中位数和四分位数间距进行汇总。统计分析包括Wilcoxon秩和检验、Kruskal-Wallis秩检验和负二项回归。
对与108篇论文对应的291份同行评审报告的分析揭示了显著的区域差异。与来自北美、欧洲或亚太地区的论文相比,来自中东、拉丁美洲或非洲的论文情感得分和礼貌得分较低(情感得分:分别为27分与60分和62分;礼貌得分:分别为43.5分与67分和65分,校正P = 0.02)。未观察到基于作者性别的显著差异(所有P>0.05)。
发现了显著的区域差异,中东、拉丁美洲和非洲的论文得分明显较低,而基于作者性别未观察到明显差异。基于性别的差异不存在表明性别偏见在同行评审背景下可能不像其他形式的偏见那样明显。该研究强调需要有针对性的干预措施来解决同行评审中的区域差异,并倡导持续努力促进学术交流中的公平和包容性。