Geman Donald, Geman Stuart, Hallonquist Neil, Younes Laurent
Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD 21287; and.
Division of Applied Mathematics, Brown University, Providence, RI 02912
Proc Natl Acad Sci U S A. 2015 Mar 24;112(12):3618-23. doi: 10.1073/pnas.1422953112. Epub 2015 Mar 9.
Today, computer vision systems are tested by their accuracy in detecting and localizing instances of objects. As an alternative, and motivated by the ability of humans to provide far richer descriptions and even tell a story about an image, we construct a "visual Turing test": an operator-assisted device that produces a stochastic sequence of binary questions from a given test image. The query engine proposes a question; the operator either provides the correct answer or rejects the question as ambiguous; the engine proposes the next question ("just-in-time truthing"). The test is then administered to the computer-vision system, one question at a time. After the system's answer is recorded, the system is provided the correct answer and the next question. Parsing is trivial and deterministic; the system being tested requires no natural language processing. The query engine employs statistical constraints, learned from a training set, to produce questions with essentially unpredictable answers-the answer to a question, given the history of questions and their correct answers, is nearly equally likely to be positive or negative. In this sense, the test is only about vision. The system is designed to produce streams of questions that follow natural story lines, from the instantiation of a unique object, through an exploration of its properties, and on to its relationships with other uniquely instantiated objects.
如今,计算机视觉系统通过检测和定位物体实例的准确性来进行测试。作为一种替代方法,并且受人类能够提供更丰富描述甚至讲述图像故事的能力的启发,我们构建了一个“视觉图灵测试”:一种由操作员辅助的设备,它从给定的测试图像中生成一系列随机的二元问题。查询引擎提出一个问题;操作员要么提供正确答案,要么以问题模糊为由拒绝该问题;引擎提出下一个问题(“即时验证”)。然后一次向计算机视觉系统提出一个问题来进行测试。记录系统的答案后,向系统提供正确答案和下一个问题。解析过程简单且具有确定性;被测试的系统不需要自然语言处理。查询引擎利用从训练集中学习到的统计约束来生成答案基本不可预测的问题——给定问题历史及其正确答案,一个问题的答案几乎同样可能是肯定或否定的。从这个意义上说,该测试仅关乎视觉。该系统旨在生成遵循自然故事情节的问题流,从单个独特物体的实例化开始,通过探索其属性,再到它与其他独特实例化物体的关系。