Pandey Lalit, Lee Donsuk, Wood Samantha M W, Wood Justin N
Informatics Department, Indiana University, Bloomington, Indiana, United States of America.
Cognitive Science Program, Indiana University, Bloomington, Indiana, United States of America.
PLoS Comput Biol. 2024 Dec 2;20(12):e1012600. doi: 10.1371/journal.pcbi.1012600. eCollection 2024 Dec.
How do newborns learn to see? We propose that visual systems are space-time fitters, meaning visual development can be understood as a blind fitting process (akin to evolution) in which visual systems gradually adapt to the spatiotemporal data distributions in the newborn's environment. To test whether space-time fitting is a viable theory for learning how to see, we performed parallel controlled-rearing experiments on newborn chicks and deep neural networks (DNNs), including CNNs and transformers. First, we raised newborn chicks in impoverished environments containing a single object, then simulated those environments in a video game engine. Second, we recorded first-person images from agents moving through the virtual animal chambers and used those images to train DNNs. Third, we compared the viewpoint-invariant object recognition performance of the chicks and DNNs. When DNNs received the same visual diet (training data) as chicks, the models developed common object recognition skills as chicks. DNNs that used time as a teaching signal-space-time fitters-also showed common patterns of successes and failures across the test viewpoints as chicks. Thus, DNNs can learn object recognition in the same impoverished environments as newborn animals. We argue that space-time fitters can serve as formal scientific models of newborn visual systems, providing image-computable models for studying how newborns learn to see from raw visual experiences.
新生儿是如何学会看东西的?我们提出视觉系统是时空拟合器,这意味着视觉发育可以被理解为一个盲目拟合过程(类似于进化),在这个过程中,视觉系统逐渐适应新生儿环境中的时空数据分布。为了测试时空拟合是否是一种关于学习如何看的可行理论,我们对新生小鸡和深度神经网络(DNN),包括卷积神经网络(CNN)和变换器,进行了平行的对照饲养实验。首先,我们将新生小鸡饲养在只包含单个物体的贫瘠环境中,然后在视频游戏引擎中模拟这些环境。其次,我们记录了在虚拟动物饲养室中移动的智能体的第一人称视角图像,并使用这些图像来训练DNN。第三,我们比较了小鸡和DNN的视角不变物体识别性能。当DNN接收与小鸡相同的视觉“饮食”(训练数据)时,这些模型就会像小鸡一样发展出常见的物体识别技能。将时间用作教学信号的DNN——时空拟合器——在测试视角上也表现出与小鸡相同的成功和失败模式。因此,DNN可以在与新生动物相同的贫瘠环境中学习物体识别。我们认为时空拟合器可以作为新生视觉系统的正式科学模型,为研究新生儿如何从原始视觉经验中学习看东西提供可图像计算的模型。