Research on artificial language acquisition has shown that insertion of short subliminal gaps to a continuous stream of speech has a notable effect on how human listeners interpret speech tokens constructed from syllabic constituents of the language. It has been argued that the observed results cannot be explained by a single statistical learning mechanism. On the other hand, computational simulations have shown that as long as the gaps are treated as structurally significant units of the language, a single distributional learning model can explain the behavioral results. However, the reason why the subliminal gaps interfere with processing of language at a linguistic level is currently unknown. In the current work, we concentrate on analyzing distributional properties of purely acoustic representations of speech, showing that a system performing unsupervised learning of transition probabilities between short-term acoustic events can replicate the main behavioral findings without a priori linguistic knowledge.