Learning policies for quadruped locomotion from scratch with reinforcement learning is challenging and motivates the need for behavioral priors. In this paper, we demonstrate that the combination of two such priors, gait trajectory generators and foot placement selection, are effective means to train robust policies. We specifically aim to learn a locomotion policy over a terrain consisting of stepping stones, where footstep placements are limited. To do this, we aim to add a behavioral prior for choosing footstep placements proposing a method to choose footstep targets which are optimal according to the value function of a policy trained to hit random footstep targets. We implement this method in simulation on flat ground and a difficult stepping stones terrain with some success and hypothesize about directions of future work which could improve our approach.