Player FM 앱으로 오프라인으로 전환하세요!
18 - Concept Extrapolation with Stuart Armstrong
Manage episode 340068925 series 2844728
Concept extrapolation is the idea of taking concepts an AI has about the world - say, "mass" or "does this picture contain a hot dog" - and extending them sensibly to situations where things are different - like learning that the world works via special relativity, or seeing a picture of a novel sausage-bread combination. For a while, Stuart Armstrong has been thinking about concept extrapolation and how it relates to AI alignment. In this episode, we discuss where his thoughts are at on this topic, what the relationship to AI alignment is, and what the open questions are.
Topics we discuss, and timestamps:
- 00:00:44 - What is concept extrapolation
- 00:15:25 - When is concept extrapolation possible
- 00:30:44 - A toy formalism
- 00:37:25 - Uniqueness of extrapolations
- 00:48:34 - Unity of concept extrapolation methods
- 00:53:25 - Concept extrapolation and corrigibility
- 00:59:51 - Is concept extrapolation possible?
- 01:37:05 - Misunderstandings of Stuart's approach
- 01:44:13 - Following Stuart's work
The transcript: axrp.net/episode/2022/09/03/episode-18-concept-extrapolation-stuart-armstrong.html
Stuart's startup, Aligned AI: aligned-ai.com
Research we discuss:
- The Concept Extrapolation sequence: alignmentforum.org/s/u9uawicHx7Ng7vwxA
- The HappyFaces benchmark: github.com/alignedai/HappyFaces
- Goal Misgeneralization in Deep Reinforcement Learning: arxiv.org/abs/2105.14111
35 에피소드
Manage episode 340068925 series 2844728
Concept extrapolation is the idea of taking concepts an AI has about the world - say, "mass" or "does this picture contain a hot dog" - and extending them sensibly to situations where things are different - like learning that the world works via special relativity, or seeing a picture of a novel sausage-bread combination. For a while, Stuart Armstrong has been thinking about concept extrapolation and how it relates to AI alignment. In this episode, we discuss where his thoughts are at on this topic, what the relationship to AI alignment is, and what the open questions are.
Topics we discuss, and timestamps:
- 00:00:44 - What is concept extrapolation
- 00:15:25 - When is concept extrapolation possible
- 00:30:44 - A toy formalism
- 00:37:25 - Uniqueness of extrapolations
- 00:48:34 - Unity of concept extrapolation methods
- 00:53:25 - Concept extrapolation and corrigibility
- 00:59:51 - Is concept extrapolation possible?
- 01:37:05 - Misunderstandings of Stuart's approach
- 01:44:13 - Following Stuart's work
The transcript: axrp.net/episode/2022/09/03/episode-18-concept-extrapolation-stuart-armstrong.html
Stuart's startup, Aligned AI: aligned-ai.com
Research we discuss:
- The Concept Extrapolation sequence: alignmentforum.org/s/u9uawicHx7Ng7vwxA
- The HappyFaces benchmark: github.com/alignedai/HappyFaces
- Goal Misgeneralization in Deep Reinforcement Learning: arxiv.org/abs/2105.14111
35 에피소드
모든 에피소드
×플레이어 FM에 오신것을 환영합니다!
플레이어 FM은 웹에서 고품질 팟캐스트를 검색하여 지금 바로 즐길 수 있도록 합니다. 최고의 팟캐스트 앱이며 Android, iPhone 및 웹에서도 작동합니다. 장치 간 구독 동기화를 위해 가입하세요.