4 - Multi-Agent Reinforcement Learning in Sequential Social Dilemmas
저장한 시리즈 ("피드 비활성화" status)
When? This feed was archived on March 27, 2022 11:30 (). Last successful fetch was on May 16, 2021 09:06 ()
Why? 피드 비활성화 status. 잠시 서버에 문제가 발생해 팟캐스트를 불러오지 못합니다.
What now? You might be able to find a more up-to-date version using the search function. This series will no longer be checked for updates. If you believe this to be in error, please check if the publisher's feed link below is valid and contact support to request the feed be restored or if you have any other concerns about this.
Manage episode 292626194 series 2920016
with Joel Z. Leibo
Multi-agent Reinforcement Learning in Sequential Social Dilemmas Joel Z. Leibo, Vinicius Zambaldi, Marc Lanctot, Janusz Marecki, Thore Graepel Matrix games like Prisoner's Dilemma have guided research on social dilemmas for decades. However, they necessarily treat the choice to cooperate or defect as an atomic action. In real-world social dilemmas these choices are temporally extended. Cooperativeness is a property that applies to policies, not elementary actions. We introduce sequential social dilemmas that share the mixed incentive structure of matrix game social dilemmas but also require agents to learn policies that implement their strategic intentions. We analyze the dynamics of policies learned by multiple self-interested independent learning agents, each using its own deep Q-network, on two Markov games we introduce here: 1. a fruit Gathering game and 2. a Wolfpack hunting game. We characterize how learned behavior in each domain changes as a function of environmental factors including resource abundance. Our experiments show how conflict can emerge from competition over shared resources and shed light on how the sequential nature of real world social dilemmas affects cooperation.Links:
5 에피소드