Efficient reinforcement learning through variance reduction and trajectory synthesis