Artwork

Machine Learning Street Talk (MLST)에서 제공하는 콘텐츠입니다. 에피소드, 그래픽, 팟캐스트 설명을 포함한 모든 팟캐스트 콘텐츠는 Machine Learning Street Talk (MLST) 또는 해당 팟캐스트 플랫폼 파트너가 직접 업로드하고 제공합니다. 누군가가 귀하의 허락 없이 귀하의 저작물을 사용하고 있다고 생각되는 경우 여기에 설명된 절차를 따르실 수 있습니다 https://ko.player.fm/legal.
Player FM -팟 캐스트 앱
Player FM 앱으로 오프라인으로 전환하세요!

Subbarao Kambhampati - Do o1 models search?

1:32:13
 
공유
 

Manage episode 462600348 series 2803422
Machine Learning Street Talk (MLST)에서 제공하는 콘텐츠입니다. 에피소드, 그래픽, 팟캐스트 설명을 포함한 모든 팟캐스트 콘텐츠는 Machine Learning Street Talk (MLST) 또는 해당 팟캐스트 플랫폼 파트너가 직접 업로드하고 제공합니다. 누군가가 귀하의 허락 없이 귀하의 저작물을 사용하고 있다고 생각되는 경우 여기에 설명된 절차를 따르실 수 있습니다 https://ko.player.fm/legal.

Join Prof. Subbarao Kambhampati and host Tim Scarfe for a deep dive into OpenAI's O1 model and the future of AI reasoning systems.

* How O1 likely uses reinforcement learning similar to AlphaGo, with hidden reasoning tokens that users pay for but never see

* The evolution from traditional Large Language Models to more sophisticated reasoning systems

* The concept of "fractal intelligence" in AI - where models work brilliantly sometimes but fail unpredictably

* Why O1's improved performance comes with substantial computational costs

* The ongoing debate between single-model approaches (OpenAI) vs hybrid systems (Google)

* The critical distinction between AI as an intelligence amplifier vs autonomous decision-maker

SPONSOR MESSAGES:

***

CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments.

https://centml.ai/pricing/

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. Are you interested in working on reasoning, or getting involved in their events?

Goto https://tufalabs.ai/

***

TOC:

1. **O1 Architecture and Reasoning Foundations**

[00:00:00] 1.1 Fractal Intelligence and Reasoning Model Limitations

[00:04:28] 1.2 LLM Evolution: From Simple Prompting to Advanced Reasoning

[00:14:28] 1.3 O1's Architecture and AlphaGo-like Reasoning Approach

[00:23:18] 1.4 Empirical Evaluation of O1's Planning Capabilities

2. **Monte Carlo Methods and Model Deep-Dive**

[00:29:30] 2.1 Monte Carlo Methods and MARCO-O1 Implementation

[00:31:30] 2.2 Reasoning vs. Retrieval in LLM Systems

[00:40:40] 2.3 Fractal Intelligence Capabilities and Limitations

[00:45:59] 2.4 Mechanistic Interpretability of Model Behavior

[00:51:41] 2.5 O1 Response Patterns and Performance Analysis

3. **System Design and Real-World Applications**

[00:59:30] 3.1 Evolution from LLMs to Language Reasoning Models

[01:06:48] 3.2 Cost-Efficiency Analysis: LLMs vs O1

[01:11:28] 3.3 Autonomous vs Human-in-the-Loop Systems

[01:16:01] 3.4 Program Generation and Fine-Tuning Approaches

[01:26:08] 3.5 Hybrid Architecture Implementation Strategies

Transcript: https://www.dropbox.com/scl/fi/d0ef4ovnfxi0lknirkvft/Subbarao.pdf?rlkey=l3rp29gs4hkut7he8u04mm1df&dl=0

REFS:

[00:02:00] Monty Python (1975)

Witch trial scene: flawed logical reasoning.

https://www.youtube.com/watch?v=zrzMhU_4m-g

[00:04:00] Cade Metz (2024)

Microsoft–OpenAI partnership evolution and control dynamics.

https://www.nytimes.com/2024/10/17/technology/microsoft-openai-partnership-deal.html

[00:07:25] Kojima et al. (2022)

Zero-shot chain-of-thought prompting ('Let's think step by step').

https://arxiv.org/pdf/2205.11916

[00:12:50] DeepMind Research Team (2023)

Multi-bot game solving with external and internal planning.

https://deepmind.google/research/publications/139455/

[00:15:10] Silver et al. (2016)

AlphaGo's Monte Carlo Tree Search and Q-learning.

https://www.nature.com/articles/nature16961

[00:16:30] Kambhampati, S. et al. (2023)

Evaluates O1's planning in "Strawberry Fields" benchmarks.

https://arxiv.org/pdf/2410.02162

[00:29:30] Alibaba AIDC-AI Team (2023)

MARCO-O1: Chain-of-Thought + MCTS for improved reasoning.

https://arxiv.org/html/2411.14405

[00:31:30] Kambhampati, S. (2024)

Explores LLM "reasoning vs retrieval" debate.

https://arxiv.org/html/2403.04121v2

[00:37:35] Wei, J. et al. (2022)

Chain-of-thought prompting (introduces last-letter concatenation).

https://arxiv.org/pdf/2201.11903

[00:42:35] Barbero, F. et al. (2024)

Transformer attention and "information over-squashing."

https://arxiv.org/html/2406.04267v2

[00:46:05] Ruis, L. et al. (2023)

Influence functions to understand procedural knowledge in LLMs.

https://arxiv.org/html/2411.12580v1

(truncated - continued in shownotes/transcript doc)

  continue reading

232 에피소드

Artwork
icon공유
 
Manage episode 462600348 series 2803422
Machine Learning Street Talk (MLST)에서 제공하는 콘텐츠입니다. 에피소드, 그래픽, 팟캐스트 설명을 포함한 모든 팟캐스트 콘텐츠는 Machine Learning Street Talk (MLST) 또는 해당 팟캐스트 플랫폼 파트너가 직접 업로드하고 제공합니다. 누군가가 귀하의 허락 없이 귀하의 저작물을 사용하고 있다고 생각되는 경우 여기에 설명된 절차를 따르실 수 있습니다 https://ko.player.fm/legal.

Join Prof. Subbarao Kambhampati and host Tim Scarfe for a deep dive into OpenAI's O1 model and the future of AI reasoning systems.

* How O1 likely uses reinforcement learning similar to AlphaGo, with hidden reasoning tokens that users pay for but never see

* The evolution from traditional Large Language Models to more sophisticated reasoning systems

* The concept of "fractal intelligence" in AI - where models work brilliantly sometimes but fail unpredictably

* Why O1's improved performance comes with substantial computational costs

* The ongoing debate between single-model approaches (OpenAI) vs hybrid systems (Google)

* The critical distinction between AI as an intelligence amplifier vs autonomous decision-maker

SPONSOR MESSAGES:

***

CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments.

https://centml.ai/pricing/

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. Are you interested in working on reasoning, or getting involved in their events?

Goto https://tufalabs.ai/

***

TOC:

1. **O1 Architecture and Reasoning Foundations**

[00:00:00] 1.1 Fractal Intelligence and Reasoning Model Limitations

[00:04:28] 1.2 LLM Evolution: From Simple Prompting to Advanced Reasoning

[00:14:28] 1.3 O1's Architecture and AlphaGo-like Reasoning Approach

[00:23:18] 1.4 Empirical Evaluation of O1's Planning Capabilities

2. **Monte Carlo Methods and Model Deep-Dive**

[00:29:30] 2.1 Monte Carlo Methods and MARCO-O1 Implementation

[00:31:30] 2.2 Reasoning vs. Retrieval in LLM Systems

[00:40:40] 2.3 Fractal Intelligence Capabilities and Limitations

[00:45:59] 2.4 Mechanistic Interpretability of Model Behavior

[00:51:41] 2.5 O1 Response Patterns and Performance Analysis

3. **System Design and Real-World Applications**

[00:59:30] 3.1 Evolution from LLMs to Language Reasoning Models

[01:06:48] 3.2 Cost-Efficiency Analysis: LLMs vs O1

[01:11:28] 3.3 Autonomous vs Human-in-the-Loop Systems

[01:16:01] 3.4 Program Generation and Fine-Tuning Approaches

[01:26:08] 3.5 Hybrid Architecture Implementation Strategies

Transcript: https://www.dropbox.com/scl/fi/d0ef4ovnfxi0lknirkvft/Subbarao.pdf?rlkey=l3rp29gs4hkut7he8u04mm1df&dl=0

REFS:

[00:02:00] Monty Python (1975)

Witch trial scene: flawed logical reasoning.

https://www.youtube.com/watch?v=zrzMhU_4m-g

[00:04:00] Cade Metz (2024)

Microsoft–OpenAI partnership evolution and control dynamics.

https://www.nytimes.com/2024/10/17/technology/microsoft-openai-partnership-deal.html

[00:07:25] Kojima et al. (2022)

Zero-shot chain-of-thought prompting ('Let's think step by step').

https://arxiv.org/pdf/2205.11916

[00:12:50] DeepMind Research Team (2023)

Multi-bot game solving with external and internal planning.

https://deepmind.google/research/publications/139455/

[00:15:10] Silver et al. (2016)

AlphaGo's Monte Carlo Tree Search and Q-learning.

https://www.nature.com/articles/nature16961

[00:16:30] Kambhampati, S. et al. (2023)

Evaluates O1's planning in "Strawberry Fields" benchmarks.

https://arxiv.org/pdf/2410.02162

[00:29:30] Alibaba AIDC-AI Team (2023)

MARCO-O1: Chain-of-Thought + MCTS for improved reasoning.

https://arxiv.org/html/2411.14405

[00:31:30] Kambhampati, S. (2024)

Explores LLM "reasoning vs retrieval" debate.

https://arxiv.org/html/2403.04121v2

[00:37:35] Wei, J. et al. (2022)

Chain-of-thought prompting (introduces last-letter concatenation).

https://arxiv.org/pdf/2201.11903

[00:42:35] Barbero, F. et al. (2024)

Transformer attention and "information over-squashing."

https://arxiv.org/html/2406.04267v2

[00:46:05] Ruis, L. et al. (2023)

Influence functions to understand procedural knowledge in LLMs.

https://arxiv.org/html/2411.12580v1

(truncated - continued in shownotes/transcript doc)

  continue reading

232 에피소드

모든 에피소드

×
 
Loading …

플레이어 FM에 오신것을 환영합니다!

플레이어 FM은 웹에서 고품질 팟캐스트를 검색하여 지금 바로 즐길 수 있도록 합니다. 최고의 팟캐스트 앱이며 Android, iPhone 및 웹에서도 작동합니다. 장치 간 구독 동기화를 위해 가입하세요.

 

빠른 참조 가이드

탐색하는 동안 이 프로그램을 들어보세요.
재생