
Player FM 앱으로 오프라인으로 전환하세요!
John Schulman
Manage episode 344497731 series 2536330
John Schulman is a cofounder of OpenAI, and currently a researcher and engineer at OpenAI.
Featured References
WebGPT: Browser-assisted question-answering with human feedback
Reiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeff Wu, Long Ouyang, Christina Kim, Christopher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders, Xu Jiang, Karl Cobbe, Tyna Eloundou, Gretchen Krueger, Kevin Button, Matthew Knight, Benjamin Chess, John Schulman
Training language models to follow instructions with human feedback
Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, Ryan Lowe
Additional References
- Our approach to alignment research, OpenAI 2022
- Training Verifiers to Solve Math Word Problems, Cobbe et al 2021
- UC Berkeley Deep RL Bootcamp Lecture 6: Nuts and Bolts of Deep RL Experimentation, John Schulman 2017
- Proximal Policy Optimization Algorithms, Schulman 2017
- Optimizing Expectations: From Deep Reinforcement Learning to Stochastic Computation Graphs, Schulman 2016
73 에피소드
Manage episode 344497731 series 2536330
John Schulman is a cofounder of OpenAI, and currently a researcher and engineer at OpenAI.
Featured References
WebGPT: Browser-assisted question-answering with human feedback
Reiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeff Wu, Long Ouyang, Christina Kim, Christopher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders, Xu Jiang, Karl Cobbe, Tyna Eloundou, Gretchen Krueger, Kevin Button, Matthew Knight, Benjamin Chess, John Schulman
Training language models to follow instructions with human feedback
Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, Ryan Lowe
Additional References
- Our approach to alignment research, OpenAI 2022
- Training Verifiers to Solve Math Word Problems, Cobbe et al 2021
- UC Berkeley Deep RL Bootcamp Lecture 6: Nuts and Bolts of Deep RL Experimentation, John Schulman 2017
- Proximal Policy Optimization Algorithms, Schulman 2017
- Optimizing Expectations: From Deep Reinforcement Learning to Stochastic Computation Graphs, Schulman 2016
73 에피소드
모든 에피소드
×플레이어 FM에 오신것을 환영합니다!
플레이어 FM은 웹에서 고품질 팟캐스트를 검색하여 지금 바로 즐길 수 있도록 합니다. 최고의 팟캐스트 앱이며 Android, iPhone 및 웹에서도 작동합니다. 장치 간 구독 동기화를 위해 가입하세요.