Humans make prediction about physical environments and future events through inference. Previous research has proposed that a common sense engine implementing probabilistic programming is used to build an internal model of the environment, and simulations of that internal model are used for inferences. Battaglia et al.(2013) have demonstrated an application of this formulation in physical scene understanding and stability judgment in the case of block tower. Here we augment this formulation by including the subjects’ eye movements as a process of sampling the environment, and propose that the underlying common sense model guides gaze toward sampling the features of the space with relevant information for the judgments about stability. We compare a base probabilistic model with one that takes the statistics of the saccades into account, and argue that the additional information improves the model predictions about subjects’ judgment.