Translating Natural Language to Visually Grounded Verifiable Plans