Reinforcement learning for energy-efficient multi-objective dynamic planning of hot rolling production