Adaptive queue management in healthcare using supervised Q-learning with time-varying reward and cost structures