False-belief task have mainly been associated with the explanatory notion of the theory of mind. However, it has often been pointed out that this kind of high-level reasoning is computational and time expensive. Viewed from an embodied intelligence perspective, the failing in a false-belief test can be the result of the impairment to recognize and track others' sensorimotor contingencies and affordances. Thus, social cognition is explained in terms of low-level signals instead of high-level reasoning. In this work, we present a generative model for optimal action selection which simultaneously can be employed to make predictions of others' actions. We demonstrate how the tracking of others' sensorimotor signals can give rise to correct false-belief inferences, while a lack thereof leads to failing. With this work, we want to emphasize the importance of sensorimotor contingencies in social cognition, which might be a key to artificial, socially intelligent systems.