A glance at an object is often sufficient to recognize it and recover fine details of its shape and appearance. How can vision be so rich, but at the same time fast? The analysis-by-synthesis approach offers an account of the richness of our percepts, but it is considered too slow to explain perception in the brain. Here we propose a version of analysis-by-synthesis that can be implemented efficiently, by combining a generative model based on a realistic 3D computer graphics engine with a recognition model based on a deep convolutional network. The recognition model initializes inference in the generative model, which is then refined by brief runs of MCMC. The model can reconstruct the approximate shape and texture of a novel face from a single view; it accounts quantitatively for human behavior in “hard” recognition tasks; and it qualitatively matches neural responses in a network of face-selective brain areas.