Artwork

The Nonlinear Fund에서 제공하는 콘텐츠입니다. 에피소드, 그래픽, 팟캐스트 설명을 포함한 모든 팟캐스트 콘텐츠는 The Nonlinear Fund 또는 해당 팟캐스트 플랫폼 파트너가 직접 업로드하고 제공합니다. 누군가가 귀하의 허락 없이 귀하의 저작물을 사용하고 있다고 생각되는 경우 여기에 설명된 절차를 따르실 수 있습니다 https://ko.player.fm/legal.
Player FM -팟 캐스트 앱
Player FM 앱으로 오프라인으로 전환하세요!

LW - We Don't Know Our Own Values, but Reward Bridges The Is-Ought Gap by johnswentworth

7:41
 
공유
 

Fetch error

Hmmm there seems to be a problem fetching this series right now. Last successful fetch was on September 26, 2024 16:04 (2M ago)

What now? This series will be checked again in the next hour. If you believe it should be working, please verify the publisher's feed link below is valid and includes actual episode links. You can contact support to request the feed be immediately fetched.

Manage episode 440805536 series 2997284
The Nonlinear Fund에서 제공하는 콘텐츠입니다. 에피소드, 그래픽, 팟캐스트 설명을 포함한 모든 팟캐스트 콘텐츠는 The Nonlinear Fund 또는 해당 팟캐스트 플랫폼 파트너가 직접 업로드하고 제공합니다. 누군가가 귀하의 허락 없이 귀하의 저작물을 사용하고 있다고 생각되는 경우 여기에 설명된 절차를 따르실 수 있습니다 https://ko.player.fm/legal.
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: We Don't Know Our Own Values, but Reward Bridges The Is-Ought Gap, published by johnswentworth on September 19, 2024 on LessWrong.
Background: "Learning" vs "Learning About"
Adaptive systems, reinforcement "learners", etc, "learn" in the sense that their behavior adapts to their environment.
Bayesian reasoners, human scientists, etc, "learn" in the sense that they have some symbolic representation of the environment, and they update those symbols over time to (hopefully) better match the environment (i.e. make the map better match the territory).
These two kinds of "learning" are not synonymous[1]. Adaptive systems "learn" things, but they don't necessarily "learn about" things; they don't necessarily have an internal map of the external territory.
(Yes, the active inference folks will bullshit about how any adaptive system must have a map of the territory, but their math does not substantively support that interpretation.) The internal heuristics or behaviors "learned" by an adaptive system are not necessarily "about" any particular external thing, and don't necessarily represent any particular external thing[2].
We Humans Learn About Our Values
"I thought I wanted X, but then I tried it and it was pretty meh."
"For a long time I pursued Y, but now I think that was more a social script than my own values."
"As a teenager, I endorsed the view that Z is the highest objective of human existence. … Yeah, it's a bit embarrassing in hindsight."
The ubiquity of these sorts of sentiments is the simplest evidence that we do not typically know our own values[3]. Rather, people often (but not always) have some explicit best guess at their own values, and that guess updates over time - i.e. we can learn about our own values.
Note the wording here: we're not just saying that human values are "learned" in the more general sense of reinforcement learning. We're saying that we humans have some internal representation of our own values, a "map" of our values, and we update that map in response to evidence. Look again at the examples at the beginning of this section:
"I thought I wanted X, but then I tried it and it was pretty meh."
"For a long time I pursued Y, but now I think that was more a social script than my own values."
"As a teenager, I endorsed the view that Z is the highest objective of human existence. … Yeah, it's a bit embarrassing in hindsight."
Notice that the wording of each example involves beliefs about values. They're not just saying "I used to feel urge X, but now I feel urge Y". They're saying "I thought I wanted X" - a belief about a value! Or "now I think that was more a social script than my own values" - again, a belief about my own values, and how those values relate to my (previous) behavior. Or "I endorsed the view that Z is the highest objective" - an explicit endorsement of a belief about values.
That's how we normally, instinctively reason about our own values. And sure, we could reword everything to avoid talking about our beliefs about values - "learning" is more general than "learning about" - but the fact that it makes sense to us to talk about our beliefs about values is strong evidence that something in our heads in fact works like beliefs about values, not just reinforcement-style "learning".
Two Puzzles
Puzzle 1: Learning About Our Own Values vs The Is-Ought Gap
Very roughly speaking, an agent could aim to pursue any values regardless of what the world outside it looks like; "how the external world is" does not tell us "how the external world should be". So when we "learn about" values, where does the evidence about values come from? How do we cross the is-ought gap?
Puzzle 2: The Role of Reward/Reinforcement
It does seem like humans have some kind of physiological "reward", in a hand-wavy reinforcement-learning-esque sense, which seems to at l...
  continue reading

2447 에피소드

Artwork
icon공유
 

Fetch error

Hmmm there seems to be a problem fetching this series right now. Last successful fetch was on September 26, 2024 16:04 (2M ago)

What now? This series will be checked again in the next hour. If you believe it should be working, please verify the publisher's feed link below is valid and includes actual episode links. You can contact support to request the feed be immediately fetched.

Manage episode 440805536 series 2997284
The Nonlinear Fund에서 제공하는 콘텐츠입니다. 에피소드, 그래픽, 팟캐스트 설명을 포함한 모든 팟캐스트 콘텐츠는 The Nonlinear Fund 또는 해당 팟캐스트 플랫폼 파트너가 직접 업로드하고 제공합니다. 누군가가 귀하의 허락 없이 귀하의 저작물을 사용하고 있다고 생각되는 경우 여기에 설명된 절차를 따르실 수 있습니다 https://ko.player.fm/legal.
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: We Don't Know Our Own Values, but Reward Bridges The Is-Ought Gap, published by johnswentworth on September 19, 2024 on LessWrong.
Background: "Learning" vs "Learning About"
Adaptive systems, reinforcement "learners", etc, "learn" in the sense that their behavior adapts to their environment.
Bayesian reasoners, human scientists, etc, "learn" in the sense that they have some symbolic representation of the environment, and they update those symbols over time to (hopefully) better match the environment (i.e. make the map better match the territory).
These two kinds of "learning" are not synonymous[1]. Adaptive systems "learn" things, but they don't necessarily "learn about" things; they don't necessarily have an internal map of the external territory.
(Yes, the active inference folks will bullshit about how any adaptive system must have a map of the territory, but their math does not substantively support that interpretation.) The internal heuristics or behaviors "learned" by an adaptive system are not necessarily "about" any particular external thing, and don't necessarily represent any particular external thing[2].
We Humans Learn About Our Values
"I thought I wanted X, but then I tried it and it was pretty meh."
"For a long time I pursued Y, but now I think that was more a social script than my own values."
"As a teenager, I endorsed the view that Z is the highest objective of human existence. … Yeah, it's a bit embarrassing in hindsight."
The ubiquity of these sorts of sentiments is the simplest evidence that we do not typically know our own values[3]. Rather, people often (but not always) have some explicit best guess at their own values, and that guess updates over time - i.e. we can learn about our own values.
Note the wording here: we're not just saying that human values are "learned" in the more general sense of reinforcement learning. We're saying that we humans have some internal representation of our own values, a "map" of our values, and we update that map in response to evidence. Look again at the examples at the beginning of this section:
"I thought I wanted X, but then I tried it and it was pretty meh."
"For a long time I pursued Y, but now I think that was more a social script than my own values."
"As a teenager, I endorsed the view that Z is the highest objective of human existence. … Yeah, it's a bit embarrassing in hindsight."
Notice that the wording of each example involves beliefs about values. They're not just saying "I used to feel urge X, but now I feel urge Y". They're saying "I thought I wanted X" - a belief about a value! Or "now I think that was more a social script than my own values" - again, a belief about my own values, and how those values relate to my (previous) behavior. Or "I endorsed the view that Z is the highest objective" - an explicit endorsement of a belief about values.
That's how we normally, instinctively reason about our own values. And sure, we could reword everything to avoid talking about our beliefs about values - "learning" is more general than "learning about" - but the fact that it makes sense to us to talk about our beliefs about values is strong evidence that something in our heads in fact works like beliefs about values, not just reinforcement-style "learning".
Two Puzzles
Puzzle 1: Learning About Our Own Values vs The Is-Ought Gap
Very roughly speaking, an agent could aim to pursue any values regardless of what the world outside it looks like; "how the external world is" does not tell us "how the external world should be". So when we "learn about" values, where does the evidence about values come from? How do we cross the is-ought gap?
Puzzle 2: The Role of Reward/Reinforcement
It does seem like humans have some kind of physiological "reward", in a hand-wavy reinforcement-learning-esque sense, which seems to at l...
  continue reading

2447 에피소드

All episodes

×
 
Loading …

플레이어 FM에 오신것을 환영합니다!

플레이어 FM은 웹에서 고품질 팟캐스트를 검색하여 지금 바로 즐길 수 있도록 합니다. 최고의 팟캐스트 앱이며 Android, iPhone 및 웹에서도 작동합니다. 장치 간 구독 동기화를 위해 가입하세요.

 

빠른 참조 가이드