Frequent words tend to be short, and many researchers have proposed that this relationship reflected a tendency towards efficient communication. Recent work has sought to formalize this observation in the context of information theory, which establishes a limit on communicative efficiency called channel capacity. In this paper, I first show that the compositional structure of natural language prevents natural language communication from getting close to the channel capacity, but that a different limit, which incorporates probability in context, may be achievable. Next, I present two corpus studies in three typologically-diverse languages that provide evidence that languages change over time towards the achievable limit. These results suggest that natural language optimizes for efficiency over time, and does so in a way that is appopriate for compositional codes.