Physical and Social Scene Understanding


Computer vision has made significant progress in locating and recognizing objects in real images. However, beyond the scope of this “what is where” challenge, it lacks the abilities to understand scenes characterizing human visual experience. The mission of this workshop is to (a) identify the key domains in which human visual perception and cognition outperform computer vision; (b) formalize the computational challenges in these domains; and (c) provide promising frameworks for solving these challenges by conducting cognitive science and computer vision studies. We propose FPIC as four key domains beyond “what is where”:   Functionality (e.g., what can be done with this slotted spoon?) Physics (e.g., will the spoon be able to pick up the meatball?) Intentionality (e.g., is the person trying to scoop up the cheese or point toward it?) Causality (e.g., why does the gravy pass through the spoon?)  

