文献检索，用中文搜 PubMed

BACKGROUND

Clinical examination of trachoma is used to justify intervention in trachoma-endemic regions. Currently, field graders are certified by determining their concordance with experienced graders using the kappa statistic. Unfortunately, trachoma grading can be highly variable and there are cases where even expert graders disagree (borderline/marginal cases). Prior work has shown that inclusion of borderline cases tends to reduce apparent agreement, as measured by kappa. Here, we confirm those results and assess performance of trainees on these borderline cases by calculating their reliability error, a measure derived from the decomposition of the Brier score.

METHODS AND FINDINGS

We trained 18 field graders using 200 conjunctival photographs from a community-randomized trial in Niger and assessed inter-grader agreement using kappa as well as reliability error. Three experienced graders scored each case for the presence or absence of trachomatous inflammation-follicular (TF) and trachomatous inflammation-intense (TI). A consensus grade for each case was defined as the one given by a majority of experienced graders. We classified cases into a unanimous subset if all 3 experienced graders gave the same grade. For both TF and TI grades, the mean kappa for trainees was higher on the unanimous subset; inclusion of borderline cases reduced apparent agreement by 15.7% for TF and 12.4% for TI. When we assessed the breakdown of the reliability error, we found that our trainees tended to over-call TF grades and under-call TI grades, especially in borderline cases.

CONCLUSIONS

The kappa statistic is widely used for certifying trachoma field graders. Exclusion of borderline cases, which even experienced graders disagree on, increases apparent agreement with the kappa statistic. Graders may agree less when exposed to the full spectrum of disease. Reliability error allows for the assessment of these borderline cases and can be used to refine an individual trainee's grading.

BACKGROUND

METHODS AND FINDINGS

CONCLUSIONS

背景

沙眼的临床检查用于确定在沙眼流行地区是否需要进行干预。目前，现场分级人员通过使用kappa统计量来确定他们与经验丰富的分级人员的一致性来获得认证。不幸的是，沙眼分级可能存在很大差异，甚至在一些情况下，即使是专家分级人员也会有不同意见（临界/边缘病例）。先前的研究表明，纳入临界病例往往会降低通过kappa测量的表面一致性。在这里，我们证实了这些结果，并通过计算可靠性误差来评估学员在这些临界病例上的表现，可靠性误差是一种从Brier评分分解得出的指标。

方法和结果

我们使用来自尼日尔一项社区随机试验的200张结膜照片对18名现场分级人员进行了培训，并使用kappa以及可靠性误差评估了分级人员之间的一致性。三名经验丰富的分级人员对每个病例是否存在沙眼性炎症滤泡型（TF）和沙眼性炎症 intense型（TI）进行评分。每个病例的共识分级定义为由大多数经验丰富的分级人员给出的分级。如果所有三名经验丰富的分级人员给出相同的分级，我们将病例分类为一致子集。对于TF和TI分级，学员在一致子集中的平均kappa值更高；纳入临界病例使TF的表面一致性降低了15.7%，TI降低了12.4%。当我们评估可靠性误差的分解时，我们发现我们的学员倾向于高估TF分级并低估TI分级，尤其是在临界病例中。

结论

kappa统计量被广泛用于认证沙眼现场分级人员。排除即使经验丰富的分级人员也存在分歧的临界病例，会增加与kappa统计量的表面一致性。分级人员在面对疾病的全谱时可能意见分歧较少。可靠性误差允许对这些临界病例进行评估，并可用于改进单个学员的分级。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

沙眼临床分级的可靠性——评估边缘病例的分级

Reliability of trachoma clinical grading--assessing grading of marginal cases.

作者信息

机构信息

出版信息

BACKGROUND

METHODS AND FINDINGS

CONCLUSIONS

相似文献

引用本文的文献

本文引用的文献

沙眼临床分级的可靠性——评估边缘病例的分级

Reliability of trachoma clinical grading--assessing grading of marginal cases.

作者信息

机构信息

出版信息

BACKGROUND

METHODS AND FINDINGS

CONCLUSIONS

背景

方法和结果

结论

相似文献

引用本文的文献

本文引用的文献