Artwork

Machine Learning Street Talk (MLST)에서 제공하는 콘텐츠입니다. 에피소드, 그래픽, 팟캐스트 설명을 포함한 모든 팟캐스트 콘텐츠는 Machine Learning Street Talk (MLST) 또는 해당 팟캐스트 플랫폼 파트너가 직접 업로드하고 제공합니다. 누군가가 귀하의 허락 없이 귀하의 저작물을 사용하고 있다고 생각되는 경우 여기에 설명된 절차를 따르실 수 있습니다 https://ko.player.fm/legal.
Player FM -팟 캐스트 앱
Player FM 앱으로 오프라인으로 전환하세요!

"I Co-Invented the Transformer. Now I'm Replacing It." & Continuous Thought Machines - Llion Jones and Luke Darlow [Sakana AI]

1:12:39
 
공유
 

Manage episode 520846118 series 2803422
Machine Learning Street Talk (MLST)에서 제공하는 콘텐츠입니다. 에피소드, 그래픽, 팟캐스트 설명을 포함한 모든 팟캐스트 콘텐츠는 Machine Learning Street Talk (MLST) 또는 해당 팟캐스트 플랫폼 파트너가 직접 업로드하고 제공합니다. 누군가가 귀하의 허락 없이 귀하의 저작물을 사용하고 있다고 생각되는 경우 여기에 설명된 절차를 따르실 수 있습니다 https://ko.player.fm/legal.

The Transformer architecture (which powers ChatGPT and nearly all modern AI) might be trapping the industry in a localized rut, preventing us from finding true intelligent reasoning, according to the person who co-invented it. Llion Jones and Luke Darlow, key figures at the research lab Sakana AI, join the show to make this provocative argument, and also introduce new research which might lead the way forwards.

**SPONSOR MESSAGES START**

Build your ideas with AI Studio from Google - http://ai.studio/build

Tufa AI Labs is hiring ML Research Engineers https://tufalabs.ai/

cyber•Fund https://cyber.fund/?utm_source=mlst is a founder-led investment firm accelerating the cybernetic economy

Hiring a SF VC Principal: https://talent.cyber.fund/companies/cyber-fund-2/jobs/57674170-ai-investment-principal#content?utm_source=mlst

Submit investment deck: https://cyber.fund/contact?utm_source=mlst

**END**

The "Spiral" Problem – Llion uses a striking visual analogy to explain what current AI is missing. If you ask a standard neural network to understand a spiral shape, it solves it by drawing tiny straight lines that just happen to look like a spiral. It "fakes" the shape without understanding the concept of spiraling.

Introducing the Continuous Thought Machine (CTM) Luke Darlow deep dives into their solution: a biology-inspired model that fundamentally changes how AI processes information.

The Maze Analogy: Luke explains that standard AI tries to solve a maze by staring at the whole image and guessing the entire path instantly. Their new machine "walks" through the maze step-by-step.

Thinking Time: This allows the AI to "ponder." If a problem is hard, the model can naturally spend more time thinking about it before answering, effectively allowing it to correct its own mistakes and backtrack—something current Language Models struggle to do genuinely.

https://sakana.ai/

https://x.com/YesThisIsLion

https://x.com/LearningLukeD

TRANSCRIPT:

https://app.rescript.info/public/share/crjzQ-Jo2FQsJc97xsBdfzfOIeMONpg0TFBuCgV2Fu8

TOC:

00:00:00 - Stepping Back from Transformers

00:00:43 - Introduction to Continuous Thought Machines (CTM)

00:01:09 - The Changing Atmosphere of AI Research

00:04:13 - Sakana’s Philosophy: Research Freedom

00:07:45 - The Local Minimum of Large Language Models

00:18:30 - Representation Problems: The Spiral Example

00:29:12 - Technical Deep Dive: CTM Architecture

00:36:00 - Adaptive Computation & Maze Solving

00:47:15 - Model Calibration & Uncertainty

01:00:43 - Sudoku Bench: Measuring True Reasoning

REFS:

Why Greatness Cannot be planned [Kenneth Stanley]

https://www.amazon.co.uk/Why-Greatness-Cannot-Planned-Objective/dp/3319155237

https://www.youtube.com/watch?v=lhYGXYeMq_E

The Hardware Lottery [Sara Hooker]

https://arxiv.org/abs/2009.06489

https://www.youtube.com/watch?v=sQFxbQ7ade0

Continuous Thought Machines [Luke Darlow et al / Sakana]

https://arxiv.org/abs/2505.05522

https://sakana.ai/ctm/

LSTM: The Comeback Story? [Prof. Sepp Hochreiter]

https://www.youtube.com/watch?v=8u2pW2zZLCs

Questioning Representational Optimism in Deep Learning: The Fractured Entangled Representation Hypothesis [Kumar/Stanley]

https://arxiv.org/pdf/2505.11581

A Spline Theory of Deep Networks [Randall Balestriero]

https://proceedings.mlr.press/v80/balestriero18b/balestriero18b.pdf

https://www.youtube.com/watch?v=86ib0sfdFtw

https://www.youtube.com/watch?v=l3O2J3LMxqI

On the Biology of a Large Language Model [Anthropic, Jack Lindsey et al]

https://transformer-circuits.pub/2025/attribution-graphs/biology.html

The ARC Prize 2024 Winning Algorithm [Daniel Franzen and Jan Disselhoff] “The ARChitects”

https://www.youtube.com/watch?v=mTX_sAq--zY

Neural Turing Machine [Graves]

https://arxiv.org/pdf/1410.5401

Adaptive Computation Time for Recurrent Neural Networks [Graves]

https://arxiv.org/abs/1603.08983

Sudoko Bench [Sakana]

https://pub.sakana.ai/sudoku/

  continue reading

238 에피소드

Artwork
icon공유
 
Manage episode 520846118 series 2803422
Machine Learning Street Talk (MLST)에서 제공하는 콘텐츠입니다. 에피소드, 그래픽, 팟캐스트 설명을 포함한 모든 팟캐스트 콘텐츠는 Machine Learning Street Talk (MLST) 또는 해당 팟캐스트 플랫폼 파트너가 직접 업로드하고 제공합니다. 누군가가 귀하의 허락 없이 귀하의 저작물을 사용하고 있다고 생각되는 경우 여기에 설명된 절차를 따르실 수 있습니다 https://ko.player.fm/legal.

The Transformer architecture (which powers ChatGPT and nearly all modern AI) might be trapping the industry in a localized rut, preventing us from finding true intelligent reasoning, according to the person who co-invented it. Llion Jones and Luke Darlow, key figures at the research lab Sakana AI, join the show to make this provocative argument, and also introduce new research which might lead the way forwards.

**SPONSOR MESSAGES START**

Build your ideas with AI Studio from Google - http://ai.studio/build

Tufa AI Labs is hiring ML Research Engineers https://tufalabs.ai/

cyber•Fund https://cyber.fund/?utm_source=mlst is a founder-led investment firm accelerating the cybernetic economy

Hiring a SF VC Principal: https://talent.cyber.fund/companies/cyber-fund-2/jobs/57674170-ai-investment-principal#content?utm_source=mlst

Submit investment deck: https://cyber.fund/contact?utm_source=mlst

**END**

The "Spiral" Problem – Llion uses a striking visual analogy to explain what current AI is missing. If you ask a standard neural network to understand a spiral shape, it solves it by drawing tiny straight lines that just happen to look like a spiral. It "fakes" the shape without understanding the concept of spiraling.

Introducing the Continuous Thought Machine (CTM) Luke Darlow deep dives into their solution: a biology-inspired model that fundamentally changes how AI processes information.

The Maze Analogy: Luke explains that standard AI tries to solve a maze by staring at the whole image and guessing the entire path instantly. Their new machine "walks" through the maze step-by-step.

Thinking Time: This allows the AI to "ponder." If a problem is hard, the model can naturally spend more time thinking about it before answering, effectively allowing it to correct its own mistakes and backtrack—something current Language Models struggle to do genuinely.

https://sakana.ai/

https://x.com/YesThisIsLion

https://x.com/LearningLukeD

TRANSCRIPT:

https://app.rescript.info/public/share/crjzQ-Jo2FQsJc97xsBdfzfOIeMONpg0TFBuCgV2Fu8

TOC:

00:00:00 - Stepping Back from Transformers

00:00:43 - Introduction to Continuous Thought Machines (CTM)

00:01:09 - The Changing Atmosphere of AI Research

00:04:13 - Sakana’s Philosophy: Research Freedom

00:07:45 - The Local Minimum of Large Language Models

00:18:30 - Representation Problems: The Spiral Example

00:29:12 - Technical Deep Dive: CTM Architecture

00:36:00 - Adaptive Computation & Maze Solving

00:47:15 - Model Calibration & Uncertainty

01:00:43 - Sudoku Bench: Measuring True Reasoning

REFS:

Why Greatness Cannot be planned [Kenneth Stanley]

https://www.amazon.co.uk/Why-Greatness-Cannot-Planned-Objective/dp/3319155237

https://www.youtube.com/watch?v=lhYGXYeMq_E

The Hardware Lottery [Sara Hooker]

https://arxiv.org/abs/2009.06489

https://www.youtube.com/watch?v=sQFxbQ7ade0

Continuous Thought Machines [Luke Darlow et al / Sakana]

https://arxiv.org/abs/2505.05522

https://sakana.ai/ctm/

LSTM: The Comeback Story? [Prof. Sepp Hochreiter]

https://www.youtube.com/watch?v=8u2pW2zZLCs

Questioning Representational Optimism in Deep Learning: The Fractured Entangled Representation Hypothesis [Kumar/Stanley]

https://arxiv.org/pdf/2505.11581

A Spline Theory of Deep Networks [Randall Balestriero]

https://proceedings.mlr.press/v80/balestriero18b/balestriero18b.pdf

https://www.youtube.com/watch?v=86ib0sfdFtw

https://www.youtube.com/watch?v=l3O2J3LMxqI

On the Biology of a Large Language Model [Anthropic, Jack Lindsey et al]

https://transformer-circuits.pub/2025/attribution-graphs/biology.html

The ARC Prize 2024 Winning Algorithm [Daniel Franzen and Jan Disselhoff] “The ARChitects”

https://www.youtube.com/watch?v=mTX_sAq--zY

Neural Turing Machine [Graves]

https://arxiv.org/pdf/1410.5401

Adaptive Computation Time for Recurrent Neural Networks [Graves]

https://arxiv.org/abs/1603.08983

Sudoko Bench [Sakana]

https://pub.sakana.ai/sudoku/

  continue reading

238 에피소드

모든 에피소드

×
 
Loading …

플레이어 FM에 오신것을 환영합니다!

플레이어 FM은 웹에서 고품질 팟캐스트를 검색하여 지금 바로 즐길 수 있도록 합니다. 최고의 팟캐스트 앱이며 Android, iPhone 및 웹에서도 작동합니다. 장치 간 구독 동기화를 위해 가입하세요.

 

빠른 참조 가이드

탐색하는 동안 이 프로그램을 들어보세요.
재생