Humans, as a cooperative species, need to coordinate in order to achieve goals that are beyond the ability of one individual. Modeling the emergence of coordination can provide ways to understand how successful joint action is established. In this paper, we investigate the problem of two agents coordinating to move an object to one agent’s target location through complementary action. We formalize the problem using a decision-theoretic framework called Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs). We utilize multiagent Q-learning as a heuristic to obtain reasonable solutions to our problem and investigate how different agent architectures, which represent hypotheses about agent abilities and internal representations, affect the convergence of the learning process. Our results show, in this problem, that agents using external signals or internal representations will not only eventually perform better than those that are coordinating in physical space alone but also outperform agents that have independent knowledge of the goal. We then employ information theoretic measures to quantify the restructuring of information flow over the learning process. We find that the external environment state varies in its informativeness about agents’ actions depending on the agents’ architecture. Finally, we discuss how these results, and the modeling technique in general, can address questions regarding the origins of communication.