People are adept at perceiving interactions from movements of simple shapes but the underlying mechanism remains unknown. Previous studies have often used object movements defined by experimenters. The present study used aerial videos recorded by drones in a real-life environment to generate decontextualized motion stimuli. Motion trajectories of displayed elements were the only visual input. We measured human judgments of interactiveness between two moving elements, and the dynamic change of such judgments over time. A hierarchical model was developed to account for human performance in this task, which represents interactivity using latent variables, and learns the distribution of critical movement features that signal potential interactivity. The model provides a good fit to human judgments and can also be generalized to the original Heider-Simmel animations (1944). The model can also synthesize decontextualized animations with controlled degree of interactiveness, providing a viable tool for studying animacy and social perception.