Learning Optimal Policies With Local Observations for Cooperative Multiagent Reinforcement Learning

The cooperative multiagent reinforcement learning (MARL) has been widely used in many practical applications. Despite its success, a fundamental issue arises in MARL that agents face the dilemma of whether to select the best action to maximize rewards or to acquire more information collectively by exploring the novel states/actions due to partial observability. To solve this issue, existing methods merge exploration and exploitation methods. However, these methods are always suboptimal and may l