Artwork

Machine Learning Street Talk (MLST)에서 제공하는 콘텐츠입니다. 에피소드, 그래픽, 팟캐스트 설명을 포함한 모든 팟캐스트 콘텐츠는 Machine Learning Street Talk (MLST) 또는 해당 팟캐스트 플랫폼 파트너가 직접 업로드하고 제공합니다. 누군가가 귀하의 허락 없이 귀하의 저작물을 사용하고 있다고 생각되는 경우 여기에 설명된 절차를 따르실 수 있습니다 https://ko.player.fm/legal.
Player FM -팟 캐스트 앱
Player FM 앱으로 오프라인으로 전환하세요!

Transformers Need Glasses! - Federico Barbero

1:00:54
 
공유
 

Manage episode 470358778 series 2803422
Machine Learning Street Talk (MLST)에서 제공하는 콘텐츠입니다. 에피소드, 그래픽, 팟캐스트 설명을 포함한 모든 팟캐스트 콘텐츠는 Machine Learning Street Talk (MLST) 또는 해당 팟캐스트 플랫폼 파트너가 직접 업로드하고 제공합니다. 누군가가 귀하의 허락 없이 귀하의 저작물을 사용하고 있다고 생각되는 경우 여기에 설명된 절차를 따르실 수 있습니다 https://ko.player.fm/legal.

Federico Barbero (DeepMind/Oxford) is the lead author of "Transformers Need Glasses!".

Have you ever wondered why LLMs struggle with seemingly simple tasks like counting or copying long strings of text? We break down the theoretical reasons behind these failures, revealing architectural bottlenecks and the challenges of maintaining information fidelity across extended contexts.

Federico explains how these issues are rooted in the transformer's design, drawing parallels to over-squashing in graph neural networks and detailing how the softmax function limits sharp decision-making.

But it's not all bad news! Discover practical "glasses" that can help transformers see more clearly, from simple input modifications to architectural tweaks.

SPONSOR MESSAGES:

***

CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. Check out their super fast DeepSeek R1 hosting!

https://centml.ai/pricing/

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.

Goto https://tufalabs.ai/

***

https://federicobarbero.com/

TRANSCRIPT + RESEARCH:

https://www.dropbox.com/s/h7ys83ztwktqjje/Federico.pdf?dl=0

TOC:

1. Transformer Limitations: Token Detection & Representation

[00:00:00] 1.1 Transformers fail at single token detection

[00:02:45] 1.2 Representation collapse in transformers

[00:03:21] 1.3 Experiment: LLMs fail at copying last tokens

[00:18:00] 1.4 Attention sharpness limitations in transformers

2. Transformer Limitations: Information Flow & Quantization

[00:18:50] 2.1 Unidirectional information mixing

[00:18:50] 2.2 Unidirectional information flow towards sequence beginning in transformers

[00:21:50] 2.3 Diagonal attention heads as expensive no-ops in LAMA/Gemma

[00:27:14] 2.4 Sequence entropy affects transformer model distinguishability

[00:30:36] 2.5 Quantization limitations lead to information loss & representational collapse

[00:38:34] 2.6 LLMs use subitizing as opposed to counting algorithms

3. Transformers and the Nature of Reasoning

[00:40:30] 3.1 Turing completeness conditions in transformers

[00:43:23] 3.2 Transformers struggle with sequential tasks

[00:45:50] 3.3 Windowed attention as solution to information compression

[00:51:04] 3.4 Chess engines: mechanical computation vs creative reasoning

[01:00:35] 3.5 Epistemic foraging introduced

REFS:

[00:01:05] Transformers Need Glasses!, Barbero et al.

https://proceedings.neurips.cc/paper_files/paper/2024/file/b1d35561c4a4a0e0b6012b2af531e149-Paper-Conference.pdf

[00:05:30] Softmax is Not Enough, Veličković et al.

https://arxiv.org/abs/2410.01104

[00:11:30] Adv Alg Lecture 15, Chawla

https://pages.cs.wisc.edu/~shuchi/courses/787-F09/scribe-notes/lec15.pdf

[00:15:05] Graph Attention Networks, Veličković

https://arxiv.org/abs/1710.10903

[00:19:15] Extract Training Data, Carlini et al.

https://arxiv.org/pdf/2311.17035

[00:31:30] 1-bit LLMs, Ma et al.

https://arxiv.org/abs/2402.17764

[00:38:35] LLMs Solve Math, Nikankin et al.

https://arxiv.org/html/2410.21272v1

[00:38:45] Subitizing, Railo

https://link.springer.com/10.1007/978-1-4419-1428-6_578

[00:43:25] NN & Chomsky Hierarchy, Delétang et al.

https://arxiv.org/abs/2207.02098

[00:51:05] Measure of Intelligence, Chollet

https://arxiv.org/abs/1911.01547

[00:52:10] AlphaZero, Silver et al.

https://pubmed.ncbi.nlm.nih.gov/30523106/

[00:55:10] Golden Gate Claude, Anthropic

https://www.anthropic.com/news/golden-gate-claude

[00:56:40] Chess Positions, Chase & Simon

https://www.sciencedirect.com/science/article/abs/pii/0010028573900042

[01:00:35] Epistemic Foraging, Friston

https://www.frontiersin.org/journals/computational-neuroscience/articles/10.3389/fncom.2016.00056/full

  continue reading

235 에피소드

Artwork
icon공유
 
Manage episode 470358778 series 2803422
Machine Learning Street Talk (MLST)에서 제공하는 콘텐츠입니다. 에피소드, 그래픽, 팟캐스트 설명을 포함한 모든 팟캐스트 콘텐츠는 Machine Learning Street Talk (MLST) 또는 해당 팟캐스트 플랫폼 파트너가 직접 업로드하고 제공합니다. 누군가가 귀하의 허락 없이 귀하의 저작물을 사용하고 있다고 생각되는 경우 여기에 설명된 절차를 따르실 수 있습니다 https://ko.player.fm/legal.

Federico Barbero (DeepMind/Oxford) is the lead author of "Transformers Need Glasses!".

Have you ever wondered why LLMs struggle with seemingly simple tasks like counting or copying long strings of text? We break down the theoretical reasons behind these failures, revealing architectural bottlenecks and the challenges of maintaining information fidelity across extended contexts.

Federico explains how these issues are rooted in the transformer's design, drawing parallels to over-squashing in graph neural networks and detailing how the softmax function limits sharp decision-making.

But it's not all bad news! Discover practical "glasses" that can help transformers see more clearly, from simple input modifications to architectural tweaks.

SPONSOR MESSAGES:

***

CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. Check out their super fast DeepSeek R1 hosting!

https://centml.ai/pricing/

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.

Goto https://tufalabs.ai/

***

https://federicobarbero.com/

TRANSCRIPT + RESEARCH:

https://www.dropbox.com/s/h7ys83ztwktqjje/Federico.pdf?dl=0

TOC:

1. Transformer Limitations: Token Detection & Representation

[00:00:00] 1.1 Transformers fail at single token detection

[00:02:45] 1.2 Representation collapse in transformers

[00:03:21] 1.3 Experiment: LLMs fail at copying last tokens

[00:18:00] 1.4 Attention sharpness limitations in transformers

2. Transformer Limitations: Information Flow & Quantization

[00:18:50] 2.1 Unidirectional information mixing

[00:18:50] 2.2 Unidirectional information flow towards sequence beginning in transformers

[00:21:50] 2.3 Diagonal attention heads as expensive no-ops in LAMA/Gemma

[00:27:14] 2.4 Sequence entropy affects transformer model distinguishability

[00:30:36] 2.5 Quantization limitations lead to information loss & representational collapse

[00:38:34] 2.6 LLMs use subitizing as opposed to counting algorithms

3. Transformers and the Nature of Reasoning

[00:40:30] 3.1 Turing completeness conditions in transformers

[00:43:23] 3.2 Transformers struggle with sequential tasks

[00:45:50] 3.3 Windowed attention as solution to information compression

[00:51:04] 3.4 Chess engines: mechanical computation vs creative reasoning

[01:00:35] 3.5 Epistemic foraging introduced

REFS:

[00:01:05] Transformers Need Glasses!, Barbero et al.

https://proceedings.neurips.cc/paper_files/paper/2024/file/b1d35561c4a4a0e0b6012b2af531e149-Paper-Conference.pdf

[00:05:30] Softmax is Not Enough, Veličković et al.

https://arxiv.org/abs/2410.01104

[00:11:30] Adv Alg Lecture 15, Chawla

https://pages.cs.wisc.edu/~shuchi/courses/787-F09/scribe-notes/lec15.pdf

[00:15:05] Graph Attention Networks, Veličković

https://arxiv.org/abs/1710.10903

[00:19:15] Extract Training Data, Carlini et al.

https://arxiv.org/pdf/2311.17035

[00:31:30] 1-bit LLMs, Ma et al.

https://arxiv.org/abs/2402.17764

[00:38:35] LLMs Solve Math, Nikankin et al.

https://arxiv.org/html/2410.21272v1

[00:38:45] Subitizing, Railo

https://link.springer.com/10.1007/978-1-4419-1428-6_578

[00:43:25] NN & Chomsky Hierarchy, Delétang et al.

https://arxiv.org/abs/2207.02098

[00:51:05] Measure of Intelligence, Chollet

https://arxiv.org/abs/1911.01547

[00:52:10] AlphaZero, Silver et al.

https://pubmed.ncbi.nlm.nih.gov/30523106/

[00:55:10] Golden Gate Claude, Anthropic

https://www.anthropic.com/news/golden-gate-claude

[00:56:40] Chess Positions, Chase & Simon

https://www.sciencedirect.com/science/article/abs/pii/0010028573900042

[01:00:35] Epistemic Foraging, Friston

https://www.frontiersin.org/journals/computational-neuroscience/articles/10.3389/fncom.2016.00056/full

  continue reading

235 에피소드

All episodes

×
 
Loading …

플레이어 FM에 오신것을 환영합니다!

플레이어 FM은 웹에서 고품질 팟캐스트를 검색하여 지금 바로 즐길 수 있도록 합니다. 최고의 팟캐스트 앱이며 Android, iPhone 및 웹에서도 작동합니다. 장치 간 구독 동기화를 위해 가입하세요.

 

빠른 참조 가이드

탐색하는 동안 이 프로그램을 들어보세요.
재생