Greedification Operators For Policy Optimization: Investigating Forward And Reverse Kl Divergences