Artwork

Soroush Pour에서 제공하는 콘텐츠입니다. 에피소드, 그래픽, 팟캐스트 설명을 포함한 모든 팟캐스트 콘텐츠는 Soroush Pour 또는 해당 팟캐스트 플랫폼 파트너가 직접 업로드하고 제공합니다. 누군가가 귀하의 허락 없이 귀하의 저작물을 사용하고 있다고 생각되는 경우 여기에 설명된 절차를 따르실 수 있습니다 https://ko.player.fm/legal.
Player FM -팟 캐스트 앱
Player FM 앱으로 오프라인으로 전환하세요!

Ep 10 - Accelerated training to become an AI safety researcher w/ Ryan Kidd (Co-Director, MATS)

1:16:58
 
공유
 

Manage episode 382650717 series 3428190
Soroush Pour에서 제공하는 콘텐츠입니다. 에피소드, 그래픽, 팟캐스트 설명을 포함한 모든 팟캐스트 콘텐츠는 Soroush Pour 또는 해당 팟캐스트 플랫폼 파트너가 직접 업로드하고 제공합니다. 누군가가 귀하의 허락 없이 귀하의 저작물을 사용하고 있다고 생각되는 경우 여기에 설명된 절차를 따르실 수 있습니다 https://ko.player.fm/legal.

We speak with Ryan Kidd, Co-Director at ML Alignment & Theory Scholars (MATS) program, previously "SERI MATS".
MATS (https://www.matsprogram.org/) provides research mentorship, technical seminars, and connections to help new AI researchers get established and start producing impactful research towards AI safety & alignment.
Prior to MATS, Ryan completed a PhD in Physics at the University of Queensland (UQ) in Australia.
We talk about:
* What the MATS program is
* Who should apply to MATS (next *deadline*: Nov 17 midnight PT)
* Research directions being explored by MATS mentors, now and in the past
* Promising alignment research directions & ecosystem gaps , in Ryan's view
Hosted by Soroush Pour. Follow me for more AGI content:
* Twitter: https://twitter.com/soroushjp
* LinkedIn: https://www.linkedin.com/in/soroushjp/
== Show links ==
-- About Ryan --
* Twitter: https://twitter.com/ryan_kidd44
* LinkedIn: https://www.linkedin.com/in/ryan-kidd-1b0574a3/
* MATS: https://www.matsprogram.org/
* LISA: https://www.safeai.org.uk/
* Manifold: https://manifold.markets/
-- Further resources --
* Book: “The Precipice” - https://theprecipice.com/
* Ikigai - https://en.wikipedia.org/wiki/Ikigai
* Fermi paradox - https://en.wikipedia.org/wiki/Fermi_p...
* Ajeya Contra - Bioanchors - https://www.cold-takes.com/forecastin...
* Chomsky hierarchy & LLM transformers paper + external memory - https://en.wikipedia.org/wiki/Chomsky...
* AutoGPT - https://en.wikipedia.org/wiki/Auto-GPT
* BabyAGI - https://github.com/yoheinakajima/babyagi
* Unilateralist's curse - https://forum.effectivealtruism.org/t...
* Jeffrey Ladish & team - fine tuning to remove LLM safeguards - https://www.alignmentforum.org/posts/...
* Epoch AI trends - https://epochai.org/trends
* The demon "Moloch" - https://slatestarcodex.com/2014/07/30...
* AI safety fundamentals course - https://aisafetyfundamentals.com/
* Anthropic sycophancy paper - https://www.anthropic.com/index/towar...
* Promising technical alignment research directions
* Scalable oversight
* Recursive reward modelling - https://deepmindsafetyresearch.medium...
* RLHF - could work for a while, but unlikely forever as we scale
* Interpretability
* Mechanistic interpretability
* Paper: GPT4 labelling GPT2 - https://openai.com/research/language-...
* Concept based interpretability
* Rome paper - https://rome.baulab.info/
* Developmental interpretability
* devinterp.com - http://devinterp.com
* Timaeus - https://timaeus.co/
* Internal consistency
* Colin Burns research - https://arxiv.org/abs/2212.03827
* Threat modelling / capabilities evaluation & demos
* Paper: Can large language models democratize access to dual-use biotechnology? - https://arxiv.org/abs/2306.03809
* ARC Evals - https://evals.alignment.org/
* Palisade Research - https://palisaderesearch.org/
* Paper: Situational awareness with Owain Evans - https://arxiv.org/abs/2309.00667
* Gradient hacking - https://www.lesswrong.com/posts/uXH4r6MmKPedk8rMA/gradient-hacking
* Past scholar's work
* Apollo Research - https://www.apolloresearch.ai/
* Leap Labs - https://www.leap-labs.com/
* Timaeus - https://timaeus.co/
* Other orgs mentioned
* Redwood Research - https://redwoodresearch.org/
Recorded Oct 25, 2023

  continue reading

15 에피소드

Artwork
icon공유
 
Manage episode 382650717 series 3428190
Soroush Pour에서 제공하는 콘텐츠입니다. 에피소드, 그래픽, 팟캐스트 설명을 포함한 모든 팟캐스트 콘텐츠는 Soroush Pour 또는 해당 팟캐스트 플랫폼 파트너가 직접 업로드하고 제공합니다. 누군가가 귀하의 허락 없이 귀하의 저작물을 사용하고 있다고 생각되는 경우 여기에 설명된 절차를 따르실 수 있습니다 https://ko.player.fm/legal.

We speak with Ryan Kidd, Co-Director at ML Alignment & Theory Scholars (MATS) program, previously "SERI MATS".
MATS (https://www.matsprogram.org/) provides research mentorship, technical seminars, and connections to help new AI researchers get established and start producing impactful research towards AI safety & alignment.
Prior to MATS, Ryan completed a PhD in Physics at the University of Queensland (UQ) in Australia.
We talk about:
* What the MATS program is
* Who should apply to MATS (next *deadline*: Nov 17 midnight PT)
* Research directions being explored by MATS mentors, now and in the past
* Promising alignment research directions & ecosystem gaps , in Ryan's view
Hosted by Soroush Pour. Follow me for more AGI content:
* Twitter: https://twitter.com/soroushjp
* LinkedIn: https://www.linkedin.com/in/soroushjp/
== Show links ==
-- About Ryan --
* Twitter: https://twitter.com/ryan_kidd44
* LinkedIn: https://www.linkedin.com/in/ryan-kidd-1b0574a3/
* MATS: https://www.matsprogram.org/
* LISA: https://www.safeai.org.uk/
* Manifold: https://manifold.markets/
-- Further resources --
* Book: “The Precipice” - https://theprecipice.com/
* Ikigai - https://en.wikipedia.org/wiki/Ikigai
* Fermi paradox - https://en.wikipedia.org/wiki/Fermi_p...
* Ajeya Contra - Bioanchors - https://www.cold-takes.com/forecastin...
* Chomsky hierarchy & LLM transformers paper + external memory - https://en.wikipedia.org/wiki/Chomsky...
* AutoGPT - https://en.wikipedia.org/wiki/Auto-GPT
* BabyAGI - https://github.com/yoheinakajima/babyagi
* Unilateralist's curse - https://forum.effectivealtruism.org/t...
* Jeffrey Ladish & team - fine tuning to remove LLM safeguards - https://www.alignmentforum.org/posts/...
* Epoch AI trends - https://epochai.org/trends
* The demon "Moloch" - https://slatestarcodex.com/2014/07/30...
* AI safety fundamentals course - https://aisafetyfundamentals.com/
* Anthropic sycophancy paper - https://www.anthropic.com/index/towar...
* Promising technical alignment research directions
* Scalable oversight
* Recursive reward modelling - https://deepmindsafetyresearch.medium...
* RLHF - could work for a while, but unlikely forever as we scale
* Interpretability
* Mechanistic interpretability
* Paper: GPT4 labelling GPT2 - https://openai.com/research/language-...
* Concept based interpretability
* Rome paper - https://rome.baulab.info/
* Developmental interpretability
* devinterp.com - http://devinterp.com
* Timaeus - https://timaeus.co/
* Internal consistency
* Colin Burns research - https://arxiv.org/abs/2212.03827
* Threat modelling / capabilities evaluation & demos
* Paper: Can large language models democratize access to dual-use biotechnology? - https://arxiv.org/abs/2306.03809
* ARC Evals - https://evals.alignment.org/
* Palisade Research - https://palisaderesearch.org/
* Paper: Situational awareness with Owain Evans - https://arxiv.org/abs/2309.00667
* Gradient hacking - https://www.lesswrong.com/posts/uXH4r6MmKPedk8rMA/gradient-hacking
* Past scholar's work
* Apollo Research - https://www.apolloresearch.ai/
* Leap Labs - https://www.leap-labs.com/
* Timaeus - https://timaeus.co/
* Other orgs mentioned
* Redwood Research - https://redwoodresearch.org/
Recorded Oct 25, 2023

  continue reading

15 에피소드

모든 에피소드

×
 
Loading …

플레이어 FM에 오신것을 환영합니다!

플레이어 FM은 웹에서 고품질 팟캐스트를 검색하여 지금 바로 즐길 수 있도록 합니다. 최고의 팟캐스트 앱이며 Android, iPhone 및 웹에서도 작동합니다. 장치 간 구독 동기화를 위해 가입하세요.

 

빠른 참조 가이드

탐색하는 동안 이 프로그램을 들어보세요.
재생