The human ability to learn from sparse rewards has been modeled with the temporal difference learning mechanism, using an actor-critic architecture (Montague, Dayan, & Sejnowski, 1996). These models incorporate an "adaptive critic" which learns a "value function": a mapping from the learner's current situation to expected future reward. In complex environments, a "value function approximator" (VFA) must be implemented to allow generalization between similar situations. While some implementations of VFAs have been successful (Tesauro, 1992), this approach does not consistently converge to a solution (Boyan and Moore, 1995). With the goal of developing a general and reliable VFA mechanism, capturing human level learning performance, we have explored the use of spiking neural networks, including liquid state machines, as a technique for VFA learning in complex environments. We report on simulations demonstrating the benefits and pitfalls of using the temporal dynamics of neural spikes to encode the learner's state.