Media Lab, Massachusetts Institute of Technology, Cambridge, MA 02139;
Media Lab, Massachusetts Institute of Technology, Cambridge, MA 02139.
Proc Natl Acad Sci U S A. 2022 Jan 4;119(1). doi: 10.1073/pnas.2110013119.
The recent emergence of machine-manipulated media raises an important societal question: How can we know whether a video that we watch is real or fake? In two online studies with 15,016 participants, we present authentic videos and deepfakes and ask participants to identify which is which. We compare the performance of ordinary human observers with the leading computer vision deepfake detection model and find them similarly accurate, while making different kinds of mistakes. Together, participants with access to the model's prediction are more accurate than either alone, but inaccurate model predictions often decrease participants' accuracy. To probe the relative strengths and weaknesses of humans and machines as detectors of deepfakes, we examine human and machine performance across video-level features, and we evaluate the impact of preregistered randomized interventions on deepfake detection. We find that manipulations designed to disrupt visual processing of faces hinder human participants' performance while mostly not affecting the model's performance, suggesting a role for specialized cognitive capacities in explaining human deepfake detection performance.
我们如何知道我们观看的视频是真实的还是伪造的?在两项有 15016 名参与者参与的在线研究中,我们展示了真实的视频和深度伪造视频,并要求参与者识别哪个是哪个。我们将普通人类观察者的表现与领先的计算机视觉深度伪造检测模型进行了比较,发现它们的准确性相似,但犯的错误类型不同。有模型预测结果可供参考的参与者比单独使用模型或人类观察者更准确,但不准确的模型预测结果往往会降低参与者的准确性。为了探究人类和机器作为深度伪造检测探测器的相对优势和劣势,我们检查了视频级特征上的人类和机器性能,并评估了预先注册的随机干预措施对深度伪造检测的影响。我们发现,旨在破坏人脸视觉处理的操作会阻碍人类参与者的表现,而对模型的表现影响不大,这表明在解释人类深度伪造检测表现时,专门的认知能力可能起到了作用。