Artwork

The Data Bros and The Firebolt Data Bros에서 제공하는 콘텐츠입니다. 에피소드, 그래픽, 팟캐스트 설명을 포함한 모든 팟캐스트 콘텐츠는 The Data Bros and The Firebolt Data Bros 또는 해당 팟캐스트 플랫폼 파트너가 직접 업로드하고 제공합니다. 누군가가 귀하의 허락 없이 귀하의 저작물을 사용하고 있다고 생각되는 경우 여기에 설명된 절차를 따르실 수 있습니다 https://ko.player.fm/legal.
Player FM -팟 캐스트 앱
Player FM 앱으로 오프라인으로 전환하세요!

Postgres vs. Elasticsearch: The Unexpected Winner in High-Stakes Search for Instacart with Ankit Mittal

21:38
 
공유
 

Manage episode 507125128 series 3418247
The Data Bros and The Firebolt Data Bros에서 제공하는 콘텐츠입니다. 에피소드, 그래픽, 팟캐스트 설명을 포함한 모든 팟캐스트 콘텐츠는 The Data Bros and The Firebolt Data Bros 또는 해당 팟캐스트 플랫폼 파트너가 직접 업로드하고 제공합니다. 누군가가 귀하의 허락 없이 귀하의 저작물을 사용하고 있다고 생각되는 경우 여기에 설명된 절차를 따르실 수 있습니다 https://ko.player.fm/legal.
In this episode of The Data Engineering Show, Benjamin Wagner sits down with Ankit Mittal, former Senior Engineer at Instacart, to explore how they revolutionized their search infrastructure by transitioning from Elasticsearch to PostgreSQL. Learn how Instacart tackled the unique challenges of fast-moving grocery inventory, achieved high-performance search capabilities, and leveraged PostgreSQL extensions for complex retrieval operations. Whether you're scaling search functionality or optimizing database performance, this deep dive offers valuable insights into building robust, production-ready search systems using PostgreSQL.

  • Discover why Instacart moved from Elasticsearch to PostgreSQL for retailer search
  • Learn about handling real-time inventory updates and search optimization
  • Explore PostgreSQL extensions, sharding strategies, and data flow architecture
  • Understand the trade-offs between different search infrastructure approaches

What You'll Learn:

  • How Instacart managed fast-moving grocery inventory data by consolidating search, ranking, and filtering into a single PostgreSQL cluster
  • Why pushing compute closer to the data layer can significantly improve search performance and reduce network calls
  • The architecture decisions behind using PostgreSQL extensions like PG Vector and custom solutions for search functionality
  • How to implement efficient data ingestion through S3-based pipelines and bulk writes instead of real-time updates
  • Why table maintenance operations like PGD pack are crucial for optimizing read throughput in production environments
  • The trade-offs between traditional search engines and relational databases for complex search implementations
  • The challenges of maintaining self-hosted PostgreSQL in a predominantly cloud-managed environment
If you enjoyed this episode, make sure to subscribe, rate, and review it on Apple Podcasts, Spotify, and YouTube Podcasts. Instructions on how to do this are here.

About the Guest(s)

Ankit is a Software Engineer at ParadeDB and former Senior Engineer at Instacart, where he specialized in PostgreSQL infrastructure and search systems. With extensive experience in database optimization and search architecture, he played a key role in modernizing Instacart's search infrastructure by transitioning from Elasticsearch to a custom PostgreSQL solution. In this episode, Ankit shares deep insights into building and scaling high-performance search systems for e-commerce, particularly focusing on the unique challenges of grocery retail's fast-moving inventory. His work at Instacart revolutionized their single-retailer search functionality, demonstrating how traditional relational databases can be adapted for complex search operations. His expertise in database systems and their practical applications in high-scale environments makes this conversation particularly valuable for engineers interested in modern search architecture and database optimization.

Quotes

"Think about it. If there's a lot of things that you can get the database to do, then the applications become simpler." - Ankit

"My non-Instacart experience has largely been in pre-PMF startups where the approach of abuse your database to its absolute limits works wonders." - Ankit

"Almost everything that we got retrieved had to be filtered out. So we go back to Elasticsearch again." - Ankit


"We traded off the quality of retrieval, hardcore core retrieval, with the whole system reducing the network calls." - Ankit

"It's a place to go to find what item is available, in what store, what item is available, at what price, including full product taxonomy graph and product and ontology." - Ankit

"The grand theme here is that we wanted more control over the cluster, how to spin it off, what kind of disks it would have." - Ankit

"We tell teams who want to have their data in this cluster, create an s3 home, create either a bucket or a home, whatever they want to do, and tell us that we would sync ourselves." - Ankit

"What we found is that the read throughput, we can throw more data if the tables are repacked nicely." - Ankit

"Most engineers who want to work on search, they are more used to the Elasticsearch shape of the query." - Ankit

"The relevance is better because they could join more things in the database. They also saw the cost of the normalized data reduced." - Ankit

Resources

Company Websites:

- Instacart - Grocery delivery platform

- ParadeDB - Database technology company

- Firebolt - Cloud data warehouse (firebolt.io)

Tools & Technologies:

- PostgreSQL - Database system
- Elasticsearch - Search engine
- PG Cat/PG Dog - PostgreSQL proxy tools
- PG Vector - PostgreSQL vector extension
- PG Repack - PostgreSQL table repacking tool
- ClickHouse - Column-oriented DBMS
- TantiVy - Rust-based search engine library
Articles:

- Instacart Search Modernization Blog Posts (Series on hybrid retrieval)
- Target's AlloyDB Migration Blog Post

For Feedback & Discussions on Firebolt Core:


Primary Speakers:

The Data Engineering Show is brought to you by firebolt.io and handcrafted by our friends over at: fame.so
Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of The Fundamentals of Data Engineering, Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.
Check out our three most downloaded episodes:
  continue reading

62 에피소드

Artwork
icon공유
 
Manage episode 507125128 series 3418247
The Data Bros and The Firebolt Data Bros에서 제공하는 콘텐츠입니다. 에피소드, 그래픽, 팟캐스트 설명을 포함한 모든 팟캐스트 콘텐츠는 The Data Bros and The Firebolt Data Bros 또는 해당 팟캐스트 플랫폼 파트너가 직접 업로드하고 제공합니다. 누군가가 귀하의 허락 없이 귀하의 저작물을 사용하고 있다고 생각되는 경우 여기에 설명된 절차를 따르실 수 있습니다 https://ko.player.fm/legal.
In this episode of The Data Engineering Show, Benjamin Wagner sits down with Ankit Mittal, former Senior Engineer at Instacart, to explore how they revolutionized their search infrastructure by transitioning from Elasticsearch to PostgreSQL. Learn how Instacart tackled the unique challenges of fast-moving grocery inventory, achieved high-performance search capabilities, and leveraged PostgreSQL extensions for complex retrieval operations. Whether you're scaling search functionality or optimizing database performance, this deep dive offers valuable insights into building robust, production-ready search systems using PostgreSQL.

  • Discover why Instacart moved from Elasticsearch to PostgreSQL for retailer search
  • Learn about handling real-time inventory updates and search optimization
  • Explore PostgreSQL extensions, sharding strategies, and data flow architecture
  • Understand the trade-offs between different search infrastructure approaches

What You'll Learn:

  • How Instacart managed fast-moving grocery inventory data by consolidating search, ranking, and filtering into a single PostgreSQL cluster
  • Why pushing compute closer to the data layer can significantly improve search performance and reduce network calls
  • The architecture decisions behind using PostgreSQL extensions like PG Vector and custom solutions for search functionality
  • How to implement efficient data ingestion through S3-based pipelines and bulk writes instead of real-time updates
  • Why table maintenance operations like PGD pack are crucial for optimizing read throughput in production environments
  • The trade-offs between traditional search engines and relational databases for complex search implementations
  • The challenges of maintaining self-hosted PostgreSQL in a predominantly cloud-managed environment
If you enjoyed this episode, make sure to subscribe, rate, and review it on Apple Podcasts, Spotify, and YouTube Podcasts. Instructions on how to do this are here.

About the Guest(s)

Ankit is a Software Engineer at ParadeDB and former Senior Engineer at Instacart, where he specialized in PostgreSQL infrastructure and search systems. With extensive experience in database optimization and search architecture, he played a key role in modernizing Instacart's search infrastructure by transitioning from Elasticsearch to a custom PostgreSQL solution. In this episode, Ankit shares deep insights into building and scaling high-performance search systems for e-commerce, particularly focusing on the unique challenges of grocery retail's fast-moving inventory. His work at Instacart revolutionized their single-retailer search functionality, demonstrating how traditional relational databases can be adapted for complex search operations. His expertise in database systems and their practical applications in high-scale environments makes this conversation particularly valuable for engineers interested in modern search architecture and database optimization.

Quotes

"Think about it. If there's a lot of things that you can get the database to do, then the applications become simpler." - Ankit

"My non-Instacart experience has largely been in pre-PMF startups where the approach of abuse your database to its absolute limits works wonders." - Ankit

"Almost everything that we got retrieved had to be filtered out. So we go back to Elasticsearch again." - Ankit


"We traded off the quality of retrieval, hardcore core retrieval, with the whole system reducing the network calls." - Ankit

"It's a place to go to find what item is available, in what store, what item is available, at what price, including full product taxonomy graph and product and ontology." - Ankit

"The grand theme here is that we wanted more control over the cluster, how to spin it off, what kind of disks it would have." - Ankit

"We tell teams who want to have their data in this cluster, create an s3 home, create either a bucket or a home, whatever they want to do, and tell us that we would sync ourselves." - Ankit

"What we found is that the read throughput, we can throw more data if the tables are repacked nicely." - Ankit

"Most engineers who want to work on search, they are more used to the Elasticsearch shape of the query." - Ankit

"The relevance is better because they could join more things in the database. They also saw the cost of the normalized data reduced." - Ankit

Resources

Company Websites:

- Instacart - Grocery delivery platform

- ParadeDB - Database technology company

- Firebolt - Cloud data warehouse (firebolt.io)

Tools & Technologies:

- PostgreSQL - Database system
- Elasticsearch - Search engine
- PG Cat/PG Dog - PostgreSQL proxy tools
- PG Vector - PostgreSQL vector extension
- PG Repack - PostgreSQL table repacking tool
- ClickHouse - Column-oriented DBMS
- TantiVy - Rust-based search engine library
Articles:

- Instacart Search Modernization Blog Posts (Series on hybrid retrieval)
- Target's AlloyDB Migration Blog Post

For Feedback & Discussions on Firebolt Core:


Primary Speakers:

The Data Engineering Show is brought to you by firebolt.io and handcrafted by our friends over at: fame.so
Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of The Fundamentals of Data Engineering, Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.
Check out our three most downloaded episodes:
  continue reading

62 에피소드

모든 에피소드

×
 
Loading …

플레이어 FM에 오신것을 환영합니다!

플레이어 FM은 웹에서 고품질 팟캐스트를 검색하여 지금 바로 즐길 수 있도록 합니다. 최고의 팟캐스트 앱이며 Android, iPhone 및 웹에서도 작동합니다. 장치 간 구독 동기화를 위해 가입하세요.

 

빠른 참조 가이드

탐색하는 동안 이 프로그램을 들어보세요.
재생