Artwork

The Nonlinear Fund에서 제공하는 콘텐츠입니다. 에피소드, 그래픽, 팟캐스트 설명을 포함한 모든 팟캐스트 콘텐츠는 The Nonlinear Fund 또는 해당 팟캐스트 플랫폼 파트너가 직접 업로드하고 제공합니다. 누군가가 귀하의 허락 없이 귀하의 저작물을 사용하고 있다고 생각되는 경우 여기에 설명된 절차를 따르실 수 있습니다 https://ko.player.fm/legal.
Player FM -팟 캐스트 앱
Player FM 앱으로 오프라인으로 전환하세요!

AF - [Aspiration-based designs] Outlook: dealing with complexity by Jobst Heitzig

3:22
 
공유
 

Manage episode 415717259 series 3337166
The Nonlinear Fund에서 제공하는 콘텐츠입니다. 에피소드, 그래픽, 팟캐스트 설명을 포함한 모든 팟캐스트 콘텐츠는 The Nonlinear Fund 또는 해당 팟캐스트 플랫폼 파트너가 직접 업로드하고 제공합니다. 누군가가 귀하의 허락 없이 귀하의 저작물을 사용하고 있다고 생각되는 경우 여기에 설명된 절차를 따르실 수 있습니다 https://ko.player.fm/legal.
Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: [Aspiration-based designs] Outlook: dealing with complexity, published by Jobst Heitzig on April 28, 2024 on The AI Alignment Forum. Summary. This teaser post sketches our current ideas for dealing with more complex environments. It will ultimately be replaced by one or more longer posts describing these in more detail. Reach out if you would like to collaborate on these issues. Multi-dimensional aspirations For real-world tasks that are specified in terms of more than a single evaluation metric, e.g., how much apples to buy and how much money to spend at most, we can generalize Algorithm 2 as follows from aspiration intervals to convex aspiration sets: Assume there are d>1 many evaluation metrics ui, combined into a vector-valued evaluation metric u=(u1,…,ud). Preparation: Pick d+1 many linearly independent linear combinations fj in the space spanned by these metrics, and consider the d+1 many policies πj each of which maximizes the expected value of the corresponding function fj. Let Vj(s) and Qj(s,a) be the expected values of u when using πj in state s or after using action a in state s, respectively (see Fig. 1). Let the admissibility simplices V(s) and Q(s,a) be the simplices spanned by the vertices Vj(s) and Qj(s,a), respectively (red and violet triangles in Fig. 1). They replace the feasibility intervals used in Algorithm 2. Policy: Given a convex state-aspiration set E(s)V(s) (central green polyhedron in Fig. 1), compute its midpoint (centre of mass) m and consider the d+1 segments ℓj from m to the corners Vj(s) of V(s) (dashed black lines in Fig. 1). For each of these segments ℓj, let Aj be the (nonempty!) set of actions for which ℓj intersects Q(s,a). For each aAj, compute the action-aspiration E(s,a)Q(s,a) by shifting a copy Cj of E(s) along ℓj towards Vj(s) until the intersection of Cj and ℓj is contained in the intersection of Q(s,a) and ℓj (half-transparent green polyhedra in Fig. 1), and then intersecting Cj with Q(s,a) to give E(s,a) (yellow polyhedra in Fig. 1). Then pick one candidate action from each Aj and randomize between these d+1 actions in proportions so that the corresponding convex combination of the sets E(s,a) is included in E(s). Note that this is always possible because m is in the convex hull of the sets Cj and the shapes of the sets E(s,a) "fit" into E(s) by construction. Aspiration propagation: After observing the successor state s', the action-aspiration E(s,a) is rescaled linearly from Q(s,a) to V(s') to give the next state-aspiration E(s'), see Fig. 2. (We also consider other variants of this general idea) Hierarchical decision making A common way of planning complex tasks is to decompose them into a hierarchy of two or more levels of subtasks. Similar to existing approaches from hierarchical reinforcement learning, we imagine that an AI system can make such hierarchical decisions as depicted in the following diagram (shown for only two hierarchical levels, but obviously generalizable to more levels): Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.
  continue reading

386 에피소드

Artwork
icon공유
 
Manage episode 415717259 series 3337166
The Nonlinear Fund에서 제공하는 콘텐츠입니다. 에피소드, 그래픽, 팟캐스트 설명을 포함한 모든 팟캐스트 콘텐츠는 The Nonlinear Fund 또는 해당 팟캐스트 플랫폼 파트너가 직접 업로드하고 제공합니다. 누군가가 귀하의 허락 없이 귀하의 저작물을 사용하고 있다고 생각되는 경우 여기에 설명된 절차를 따르실 수 있습니다 https://ko.player.fm/legal.
Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: [Aspiration-based designs] Outlook: dealing with complexity, published by Jobst Heitzig on April 28, 2024 on The AI Alignment Forum. Summary. This teaser post sketches our current ideas for dealing with more complex environments. It will ultimately be replaced by one or more longer posts describing these in more detail. Reach out if you would like to collaborate on these issues. Multi-dimensional aspirations For real-world tasks that are specified in terms of more than a single evaluation metric, e.g., how much apples to buy and how much money to spend at most, we can generalize Algorithm 2 as follows from aspiration intervals to convex aspiration sets: Assume there are d>1 many evaluation metrics ui, combined into a vector-valued evaluation metric u=(u1,…,ud). Preparation: Pick d+1 many linearly independent linear combinations fj in the space spanned by these metrics, and consider the d+1 many policies πj each of which maximizes the expected value of the corresponding function fj. Let Vj(s) and Qj(s,a) be the expected values of u when using πj in state s or after using action a in state s, respectively (see Fig. 1). Let the admissibility simplices V(s) and Q(s,a) be the simplices spanned by the vertices Vj(s) and Qj(s,a), respectively (red and violet triangles in Fig. 1). They replace the feasibility intervals used in Algorithm 2. Policy: Given a convex state-aspiration set E(s)V(s) (central green polyhedron in Fig. 1), compute its midpoint (centre of mass) m and consider the d+1 segments ℓj from m to the corners Vj(s) of V(s) (dashed black lines in Fig. 1). For each of these segments ℓj, let Aj be the (nonempty!) set of actions for which ℓj intersects Q(s,a). For each aAj, compute the action-aspiration E(s,a)Q(s,a) by shifting a copy Cj of E(s) along ℓj towards Vj(s) until the intersection of Cj and ℓj is contained in the intersection of Q(s,a) and ℓj (half-transparent green polyhedra in Fig. 1), and then intersecting Cj with Q(s,a) to give E(s,a) (yellow polyhedra in Fig. 1). Then pick one candidate action from each Aj and randomize between these d+1 actions in proportions so that the corresponding convex combination of the sets E(s,a) is included in E(s). Note that this is always possible because m is in the convex hull of the sets Cj and the shapes of the sets E(s,a) "fit" into E(s) by construction. Aspiration propagation: After observing the successor state s', the action-aspiration E(s,a) is rescaled linearly from Q(s,a) to V(s') to give the next state-aspiration E(s'), see Fig. 2. (We also consider other variants of this general idea) Hierarchical decision making A common way of planning complex tasks is to decompose them into a hierarchy of two or more levels of subtasks. Similar to existing approaches from hierarchical reinforcement learning, we imagine that an AI system can make such hierarchical decisions as depicted in the following diagram (shown for only two hierarchical levels, but obviously generalizable to more levels): Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.
  continue reading

386 에피소드

모든 에피소드

×
 
Loading …

플레이어 FM에 오신것을 환영합니다!

플레이어 FM은 웹에서 고품질 팟캐스트를 검색하여 지금 바로 즐길 수 있도록 합니다. 최고의 팟캐스트 앱이며 Android, iPhone 및 웹에서도 작동합니다. 장치 간 구독 동기화를 위해 가입하세요.

 

빠른 참조 가이드