Speech sounds within a linguistic system are both categorical and combinatorial and there are constraints on how elements can be recombined. To investigate the origins of this combinatorial structure, we conducted an iterated learning experiment with human participants, studying the transmission of an artificial system of sounds. In this study, participants learn and recall a system of sounds that are produced with a slide whistle, an instrument that is both intuitive and non-linguistic. The system they are exposed to is the recall output of the previous participant. Transmission from participant to participant causes the system to change and become cumulatively more learnable and more structured. This shows that combinatorial structure can culturally emerge in an artificial sound system through iterated learning.