We present a neural mechanism for interpreting and executing visually presented commands. These are simple verb-noun commands (such as WRITE THREE) and can also include conditionals ([if] SEE SEVEN, [then] WRITE THREE). We apply this to a simplified version of our large-scale functional brain model “Spaun”, where input is a 28x28 pixel visual stimulus, with a different pattern for each word. Output controls a simulated arm, giving hand-written answers. Cortical areas for categorizing, storing, and interpreting information are controlled by the basal ganglia (action selection) and thalamus (routing). The final model has ~100,000 LIF spiking neurons. We show that the model is extremely robust to neural damage (40% of neurons can be destroyed before performance drops significantly). Performance also drops for visual display times less than 250ms. Importantly, the system also scales to large vocabularies (~100,000 nouns and verbs) without requiring an exponentially large number of neurons.