k.a. algorithms) that can be experimentally distinguished. This synergy is leading to high-performing artificial vision systems (Pinto et al., 2008a, Pinto et al., 2009b and Serre et al., 2007b). We expect this pace to accelerate, to fully explain human abilities, to reveal ways for extending and generalizing beyond those abilities, and to expose ways to repair broken neuronal circuits and augment normal circuits. Progress toward understanding object recognition is driven by linking
phenomena at different levels of abstraction. “Phenomena” at one level of abstraction (e.g., behavioral success on well-designed benchmark tests) are best explained by “mechanisms” at one level of abstraction below (e.g., a neuronal spiking population
code in inferior temporal cortex, IT). Notably, these “mechanisms” are themselves “phenomena” that also require mechanistic explanations at an even lower level of abstraction (e.g., neuronal connectivity, intracellular PLX3397 in vitro events). Progress is facilitated by good intuitions about the most useful levels of abstraction as well as measurements of well-chosen phenomena at nearby levels. It then becomes crucial to define alternative hypotheses that link those sets of phenomena and to determine those that explain the most data and generalize outside the specific conditions on which they were tested. In practice, we do not require all levels of abstraction and their links to be fully understood, but rather that both the phenomena and the linking hypotheses be understood sufficiently well as selleck compound to achieve the broader policy missions of the research (e.g., building artificial vision systems, visual prosthetics, repairing disrupted brain circuits, etc.). To that end, we review three sets of phenomena at three levels of abstraction (core recognition behavior, the IT population representation, and IT single-unit 17-DMAG (Alvespimycin) HCl responses), and we describe the links between these phenomena (sections 1 and 2 below). We then consider how the architecture and plasticity
of the ventral visual stream might produce a solution for object recognition in IT (section 3), and we conclude by discussing key open directions (section 4). Vision accomplishes many tasks besides object recognition, including object tracking, segmentation, obstacle avoidance, object grasping, etc., and these tasks are beyond the scope of this review. For example, studies point to the importance of the dorsal visual stream for supporting the ability to guide the eyes or covert processing resources (spatial “attention”) toward objects (e.g., Ikkai et al., 2011, Noudoost et al., 2010 and Valyear et al., 2006) and to shape the hand to manipulate an object (e.g., Goodale et al., 1994 and Murata et al., 2000), and we do not review that work here (see Cardoso-Leite and Gorea, 2010, Jeannerod et al., 1995, Konen and Kastner, 2008 and Sakata et al., 1997). Instead, we and others define object recognition as the ability to assign labels (e.g.