Learning sequential actions is an essential ability, for most daily activities are sequential. We modify the trajectory serial reaction time (SRT) task, used to teach people a consistent sequence of mouse movements by cueing them with the next target response. We introduce a reinforcement learning (RL) version of the paradigm in which no cue appears. Instead, learners must explore response alternatives, receiving penalties when incorrect and rewards when correct. Learners are not told that they will learn a single deterministic sequence of responses, nor that it will repeat (nor how often), nor how long it is. Performance was bimodal: half performed poorly, and yet half performed remarkably well, acquiring the full 10-item sequence within 10 repetitions. We compare these groups’ detailed results in this RL task with a cued trajectory SRT task, finding both similarities and discrepancies. Human learners outperform three standard RL models and have different patterns of errors.