Computational models of semantics have emerged as powerful tools for natural language processing. Recent work has developed models to handle compositionality, but these models have typically been evaluated on large, uncontrolled corpora. In this paper, we constructed a controlled set of phrase pairs and collected phrase similarity judgments, revealing novel insights into human semantic representation. None of the computational models that we considered were able to capture the pattern of human judgments. The results of a second experiment, using the same stimuli with a transformational judgment task, support a transformational account of similarity, according to which the similarity between phrases is inversely related to the number of edits required to transform one mental model into another. Taken together, our results indicate that popular models of compositional semantics do not capture important facets of human semantic representation.