
Player FM 앱으로 오프라인으로 전환하세요!
Episode 45: Your AI application is broken. Here’s what to do about it.
Manage episode 467694275 series 3317544
Too many teams are building AI applications without truly understanding why their models fail. Instead of jumping straight to LLM evaluations, dashboards, or vibe checks, how do you actually fix a broken AI app?
In this episode, Hugo speaks with Hamel Husain, longtime ML engineer, open-source contributor, and consultant, about why debugging generative AI systems starts with looking at your data.
In this episode, we dive into:
- Why “look at your data” is the best debugging advice no one follows.
- How spreadsheet-based error analysis can uncover failure modes faster than complex dashboards.
- The role of synthetic data in bootstrapping evaluation.
- When to trust LLM judges—and when they’re misleading.
- Why most AI dashboards measuring truthfulness, helpfulness, and conciseness are often a waste of time.
If you're building AI-powered applications, this episode will change how you approach debugging, iteration, and improving model performance in production.
LINKS
60 에피소드
Manage episode 467694275 series 3317544
Too many teams are building AI applications without truly understanding why their models fail. Instead of jumping straight to LLM evaluations, dashboards, or vibe checks, how do you actually fix a broken AI app?
In this episode, Hugo speaks with Hamel Husain, longtime ML engineer, open-source contributor, and consultant, about why debugging generative AI systems starts with looking at your data.
In this episode, we dive into:
- Why “look at your data” is the best debugging advice no one follows.
- How spreadsheet-based error analysis can uncover failure modes faster than complex dashboards.
- The role of synthetic data in bootstrapping evaluation.
- When to trust LLM judges—and when they’re misleading.
- Why most AI dashboards measuring truthfulness, helpfulness, and conciseness are often a waste of time.
If you're building AI-powered applications, this episode will change how you approach debugging, iteration, and improving model performance in production.
LINKS
60 에피소드
모든 에피소드
×플레이어 FM에 오신것을 환영합니다!
플레이어 FM은 웹에서 고품질 팟캐스트를 검색하여 지금 바로 즐길 수 있도록 합니다. 최고의 팟캐스트 앱이며 Android, iPhone 및 웹에서도 작동합니다. 장치 간 구독 동기화를 위해 가입하세요.