The University of Texas at Dallas, Richardson, Texas, USA.
National Institute of Standards and Technology, Gaithersburg, Maryland, USA.
Behav Res Methods. 2024 Mar;56(3):1244-1259. doi: 10.3758/s13428-023-02092-7. Epub 2023 Jun 9.
Measures of face-identification proficiency are essential to ensure accurate and consistent performance by professional forensic face examiners and others who perform face-identification tasks in applied scenarios. Current proficiency tests rely on static sets of stimulus items and so cannot be administered validly to the same individual multiple times. To create a proficiency test, a large number of items of "known" difficulty must be assembled. Multiple tests of equal difficulty can be constructed then using subsets of items. We introduce the Triad Identity Matching (TIM) test and evaluate it using item response theory (IRT). Participants view face-image "triads" (N = 225) (two images of one identity, one image of a different identity) and select the different identity. In Experiment 3, university students (N = 197) showed wide-ranging accuracy on the TIM test, and IRT modeling demonstrated that the TIM items span various difficulty levels. In Experiment 3, we used IRT-based item metrics to partition the test into subsets of specific difficulties. Simulations showed that subsets of the TIM items yielded reliable estimates of subject ability. In Experiments 3a and b, we found that the student-derived IRT model reliably evaluated the ability of non-student participants and that ability generalized across different test sessions. In Experiment 3c, we show that TIM test performance correlates with other common face-recognition tests. In summary, the TIM test provides a starting point for developing a framework that is flexible and calibrated to measure proficiency across various ability levels (e.g., professionals or populations with face-processing deficits).
面部识别能力的衡量标准对于确保专业法医面部识别员和其他在应用场景中执行面部识别任务的人员的准确和一致表现至关重要。目前的熟练程度测试依赖于静态的刺激项目集,因此不能对同一个人进行多次有效测试。为了创建一个熟练程度测试,必须组装大量“已知”难度的项目。然后可以使用项目子集构建多个具有相同难度的测试。我们介绍了三元身份匹配(TIM)测试,并使用项目反应理论(IRT)对其进行了评估。参与者查看面部图像“三元组”(N=225)(一个身份的两个图像,一个不同身份的一个图像)并选择不同的身份。在实验 3 中,大学生(N=197)在 TIM 测试中表现出广泛的准确性,IRT 建模表明 TIM 项目涵盖了各种难度水平。在实验 3 中,我们使用基于 IRT 的项目指标将测试分为特定难度的子集。模拟表明,TIM 项目的子集可以可靠地评估受试者的能力。在实验 3a 和 3b 中,我们发现学生衍生的 IRT 模型可靠地评估了非学生参与者的能力,并且能力在不同的测试会议中具有通用性。在实验 3c 中,我们表明 TIM 测试性能与其他常见的面部识别测试相关。总之,TIM 测试为开发一个灵活且经过校准的框架提供了起点,可以衡量各种能力水平(例如,专业人员或面部处理缺陷人群)的熟练程度。