The rich literature on multiple object tracking (MOT) conclusively demonstrates that humans are able to visually track a small number of objects (Pylyshyn & Storm 1988, Alvarez & Franconeri 2007). There is considerably less agreement on what perceptual and cognitive processes are involved. While it is clear that MOT is attentionally demanding, various accounts of MOT performance centrally involve pre-attentional mechanisms as well. In this paper we present an account of object tracking in the ARCADIA framework (Bridewell & Bello 2015) that treats MOT as dependent upon both pre-attentive and attention-bound processes. We show that with minimal addition this model replicates a variety of core phenomena in the MOT literature and provides an algorithmic explanation of human performance limitations.