
Player FM 앱으로 오프라인으로 전환하세요!
Ian Osband
Manage episode 405194899 series 2536330
Ian Osband is a Research scientist at OpenAI (ex DeepMind, Stanford) working on decision making under uncertainty.
We spoke about:
- Information theory and RL
- Exploration, epistemic uncertainty and joint predictions
- Epistemic Neural Networks and scaling to LLMs
Featured References
Reinforcement Learning, Bit by Bit
Xiuyuan Lu, Benjamin Van Roy, Vikranth Dwaracherla, Morteza Ibrahimi, Ian Osband, Zheng Wen
From Predictions to Decisions: The Importance of Joint Predictive Distributions
Zheng Wen, Ian Osband, Chao Qin, Xiuyuan Lu, Morteza Ibrahimi, Vikranth Dwaracherla, Mohammad Asghari, Benjamin Van Roy
Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Morteza Ibrahimi, Xiuyuan Lu, Benjamin Van Roy
Approximate Thompson Sampling via Epistemic Neural Networks
Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Morteza Ibrahimi, Xiuyuan Lu, Benjamin Van Roy
Additional References
- Thesis defence, Ian Osband
- Homepage, Ian Osband
- Epistemic Neural Networks at Stanford RL Forum
- Behaviour Suite for Reinforcement Learning, Osband et al 2019
- Efficient Exploration for LLMs, Dwaracherla et al 2024
73 에피소드
Manage episode 405194899 series 2536330
Ian Osband is a Research scientist at OpenAI (ex DeepMind, Stanford) working on decision making under uncertainty.
We spoke about:
- Information theory and RL
- Exploration, epistemic uncertainty and joint predictions
- Epistemic Neural Networks and scaling to LLMs
Featured References
Reinforcement Learning, Bit by Bit
Xiuyuan Lu, Benjamin Van Roy, Vikranth Dwaracherla, Morteza Ibrahimi, Ian Osband, Zheng Wen
From Predictions to Decisions: The Importance of Joint Predictive Distributions
Zheng Wen, Ian Osband, Chao Qin, Xiuyuan Lu, Morteza Ibrahimi, Vikranth Dwaracherla, Mohammad Asghari, Benjamin Van Roy
Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Morteza Ibrahimi, Xiuyuan Lu, Benjamin Van Roy
Approximate Thompson Sampling via Epistemic Neural Networks
Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Morteza Ibrahimi, Xiuyuan Lu, Benjamin Van Roy
Additional References
- Thesis defence, Ian Osband
- Homepage, Ian Osband
- Epistemic Neural Networks at Stanford RL Forum
- Behaviour Suite for Reinforcement Learning, Osband et al 2019
- Efficient Exploration for LLMs, Dwaracherla et al 2024
73 에피소드
모든 에피소드
×플레이어 FM에 오신것을 환영합니다!
플레이어 FM은 웹에서 고품질 팟캐스트를 검색하여 지금 바로 즐길 수 있도록 합니다. 최고의 팟캐스트 앱이며 Android, iPhone 및 웹에서도 작동합니다. 장치 간 구독 동기화를 위해 가입하세요.