AF - Towards a formalization of the agent structure problem by Alex Altair

The Nonlinear Library: Alignment Forum

The Nonlinear Fund에서 제공하는 콘텐츠입니다. 에피소드, 그래픽, 팟캐스트 설명을 포함한 모든 팟캐스트 콘텐츠는 The Nonlinear Fund 또는 해당 팟캐스트 플랫폼 파트너가 직접 업로드하고 제공합니다. 누군가가 귀하의 허락 없이 귀하의 저작물을 사용하고 있다고 생각되는 경우 여기에 설명된 절차를 따르실 수 있습니다 https://ko.player.fm/legal.

1M ago 22:05

MP3•에피소드 홈

Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Towards a formalization of the agent structure problem, published by Alex Altair on April 29, 2024 on The AI Alignment Forum. In Clarifying the Agent-Like Structure Problem (2022), John Wentworth describes a hypothetical instance of what he calls a selection theorem. In Scott Garrabrant's words, the question is, does agent-like behavior imply agent-like architecture? That is, if we take some class of behaving things and apply a filter for agent-like behavior, do we end up selecting things with agent-like architecture (or structure)? Of course, this question is heavily under-specified. So another way to ask this is, under which conditions does agent-like behavior imply agent-like structure? And, do those conditions feel like they formally encapsulate a naturally occurring condition? For the Q1 2024 cohort of AI Safety Camp, I was a Research Lead for a team of six people, where we worked a few hours a week to better understand and make progress on this idea. The teammates[1] were Einar Urdshals, Tyler Tracy, Jasmina Nasufi, Mateusz Bagiński, Amaury Lorin, and Alfred Harwood. The AISC project duration was too short to find and prove a theorem version of the problem. Instead, we investigated questions like: What existing literature is related to this question? What are the implications of using different types of environment classes? What could "structure" mean, mathematically? What could "modular" mean? What could it mean, mathematically, for something to be a model of something else? What could a "planning module" look like? How does it relate to "search"? Can the space of agent-like things be broken up into sub-types? What exactly is a "heuristic"? Other posts on our progress may come out later. For this post, I'd like to simply help concretize the problem that we wish to make progress on. What are "agent behavior" and "agent structure"? When we say that something exhibits agent behavior, we mean that seems to make the trajectory of the system go a certain way. We mean that, instead of the "default" way that a system might evolve over time, the presence of this agent-like thing makes it go some other way. The more specific of a target it seems to hit, the more agentic we say it behaves. On LessWrong, the word "optimization" is often used for this type of system behavior. So that's the behavior that we're gesturing toward. Seeing this behavior, one might say that the thing seems to want something, and tries to get it. It seems to somehow choose actions which steer the future toward the thing that it wants. If it does this across a wide range of environments, then it seems like it must be paying attention to what happens around it, use that information to infer how the world around it works, and use that model of the world to figure out what actions to take that would be more likely to lead to the outcomes it wants. This is a vague description of a type of structure. That is, it's a description of a type of process happening inside the agent-like thing. So, exactly when does the observation that something robustly optimizes imply that it has this kind of process going on inside it? Our slightly more specific working hypothesis for what agent-like structure is consists of three parts; a world-model, a planning module, and a representation of the agent's values. The world-model is very roughly like Bayesian inference; it starts out ignorant about what world its in, and updates as observations come in. The planning module somehow identifies candidate actions, and then uses the world model to predict their outcome. And the representation of its values is used to select which outcome is preferred. It then takes the corresponding action. This may sound to you like an algorithm for utility maximization. But a big part of the idea behind the agent structure problem is that ther...

385 에피소드

#The Nonlinear Fund #Podcasting Education #Of TexttoSpeech