최고 Data Engineering 팟캐스트 (2025)

1
Blurring Lines: Data, AI, and the New Playbook for Team Velocity 1:00:57

10d ago1:00:57

1:00:57

Summary In this crossover episode, Max Beauchemin explores how multiplayer, multi‑agent engineering is transforming the way individuals and teams build data and AI systems. He digs into the shifting boundary between data and AI engineering, the rise of “context as code,” and how just‑in‑time retrieval via MCP and CLIs lets agents gather what they n…

1
How Covestro Turns Airflow Into a Simulation Toolbox with Anja Mackenzie 23:10

14d ago23:10

23:10

Building scalable, reproducible workflows for scientific computing often requires bridging the gap between research flexibility and enterprise reliability. In this episode, Anja MacKenzie, Expert for Cheminformatics at Covestro, explains how her team uses Airflow and Kubernetes to create a shared, self-service platform for computational chemistry. …

1
60 Billion Predictions Daily: Inside Credit Karma’s Agentic Data Layer with Maddie Daianu 19:55

15d ago19:55

19:55

What does MLOps look like when you are deploying 22,000 models a month? Maddie Daianu, Head of Data and AI at Intuit Credit Karma, joins the Data Bros to pull back the curtain on one of the most high-volume data environments in FinTech. With a 100-person team serving 140 million members, standard data practices break down. Maddie shares how her tea…

1
State, Scale, and Signals: Rethinking Orchestration with Durable Execution 51:46

17d ago51:46

51:46

Summary In this episode Preeti Somal, EVP of Engineering at Temporal, talks about the durable execution model and how it reshapes the way teams build reliable, stateful systems for data and AI. She explores Temporal’s code‑first programming model—workflows, activities, task queues, and replay—and how it eliminates hand‑rolled retry, checkpoint, and…

1
Building Secure Financial Data Platforms at AgileEngine with Valentyn Druzhynin 21:16

21d ago21:16

21:16

The use of Apache Airflow in financial services demands a balance between innovation and compliance. Agile Engine’s approach to orchestration showcases how secure, auditable workflows can scale even within the constraints of regulatory environments. In this episode, Valentyn Druzhynin, Senior Data Engineer at AgileEngine, discusses how his team lev…

1
The AI Data Paradox: High Trust in Models, Low Trust in Data 51:35

24d ago51:35

51:35

Summary In this episode of the Data Engineering Podcast Ariel Pohoryles, head of product marketing for Boomi's data management offerings, talks about a recent survey of 300 data leaders on how organizations are investing in data to scale AI. He shares a paradox uncovered in the research: while 77% of leaders trust the data feeding their AI systems,…

1
How Redica Transformed Their Data With Airflow and Snowflake with Shankar Mahindar 23:48

28d ago23:48

23:48

The life sciences industry relies on data accuracy, regulatory insight and quality intelligence. Building a unified system that keeps these elements aligned is no small feat. In this episode, we welcome Shankar Mahindar, Senior Data Engineer II at Redica Systems. We discuss how the team restructures its data platform with Airflow to strengthen gove…

1
Bridging the AI–Data Gap: Collect, Curate, Serve 50:40

1M ago50:40

50:40

Summary In this episode of the Data Engineering Podcast Omri Lifshitz (CTO) and Ido Bronstein (CEO) of Upriver talk about the growing gap between AI's demand for high-quality data and organizations' current data practices. They discuss why AI accelerates both the supply and demand sides of data, highlighting that the bottleneck lies in the "middle …

1
How Airflow and AI Power Investigative Journalism at the Financial Times with Zdravko Hvarlingov 24:28

1M ago24:28

24:28

The Financial Times leverages Airflow and AI to uncover powerful stories hidden within vast, unstructured data. In this episode, Zdravko Hvarlingov, Senior Software Engineer at the Financial Times, discusses building multi-tenant Airflow systems and AI-driven pipelines that surface stories that might otherwise be missed. Zdravko walks through entit…

1
Beyond the Perimeter: Practical Patterns for Fine‑Grained Data Access 1:05:00

1M ago1:05:00

1:05:00

Summary In this episode of the Data Engineering Podcast Matt Topper, president of UberEther, talks about the complex challenge of identity, credentials, and access control in modern data platforms. With the shift to composable ecosystems, integration burdens have exploded, fracturing governance and auditability across warehouses, lakes, files, vect…

1
Episode 3: The Pipeline Pit Crew: Monitoring, Troubleshooting, and Optimizing Your AWS Data 12:36

1M ago12:36

12:36

Keep your data pipelines running smoothly! This episode covers Domain 3 (22% of the DEA-C01 exam). We dive into setting up alarms with CloudWatch, troubleshooting stuck jobs with Glue Logs, optimizing performance and cost in Redshift, and ensuring data quality with AWS Glue DataBrew.저자 James

1
Episode 4: The Data Fortress: Securing and Governing Data for the DEA-C01 12:20

1M ago12:20

12:20

Lock down your data platform! This is the final domain, Domain 4 (18% of the DEA-C01 exam). We cover essential security best practices: using IAM and Lake Formation for access control, enforcing encryption with KMS (at rest and in transit), and securing network access via VPC and Security Groups.저자 James

1
Episode 2: AWS Data Store Mastery 14:16

1M ago14:16

14:16

Where should you put your data? We tackle Domain 2 (26% of the DEA-C01 exam) by comparing Redshift, DynamoDB, and RDS. Learn how to design optimal schemas, use the AWS Glue Data Catalog, and implement S3 Lifecycle Policies to manage data lifespan and control costs.저자 James

1
Episode 1: Mastering the AWS Data Assembly Line 18:05

1M ago18:05

18:05

This is the essential guide to Domain 1: Data Ingestion and Transformation—the biggest section (34%) of the AWS Certified Data Engineer - Associate (DEA-C01) exam! We break down the core components of a successful data pipeline. Learn to compare Batch vs. Streaming with services like Kinesis and DMS, master ETL/ELT using AWS Glue and EMR, and orche…

1
Inside Vinted’s Code-Generated Airflow Pipelines with Oscar Ligthart and Rodrigo Loredo 29:36

1M ago29:36

29:36

The shift from monolithic to decentralized data workflows changes how teams build, connect and scale pipelines. In this episode, we feature Oscar Ligthart, Lead Data Engineer, and Rodrigo Loredo, Lead Analytics Engineer, both at Vinted, as we unpack their YAML-driven abstraction that generates Airflow DAGs and standardizes cross-team orchestration.…

1
The True Costs of Legacy Systems: Technical Debt, Risk, and Exit Strategies 1:04:16

2M ago1:04:16

1:04:16

Summary In this episode Kate Shaw, Senior Product Manager for Data and SLIM at SnapLogic, talks about the hidden and compounding costs of maintaining legacy systems—and practical strategies for modernization. She unpacks how “legacy” is less about age and more about when a system becomes a risk: blocking innovation, consuming excess IT time, and cr…

1
Transforming Data Pipelines at XENA Intelligence with Naseem Shah 28:32

2M ago28:32

28:32

The shift from simple cron jobs to orchestrated AI-powered workflows is reshaping how startups scale. For a small team, these transitions come with unique challenges and big opportunities. In this episode, Naseem Shah, Head of Engineering at Xena Intelligence, shares how he built data pipelines from scratch, adopted Apache Airflow and transformed A…

1
Context Engineering as a Discipline: Building Governed AI Analytics 51:58

2M ago51:58

51:58

Summary In this episode of the Data Engineering Podcast, host Tobias Macey welcomes back Nick Schrock, CTO and founder of Dagster Labs, to discuss Compass - a Slack-native, agentic analytics system designed to keep data teams connected with business stakeholders. Nick shares his journey from initial skepticism to embracing agentic AI as model and a…

1
Scaling Geospatial Workflows With Airflow at Overture Maps Foundation and Wherobots with Alex Iannicelli and Daniel Smith 24:03

2M ago24:03

24:03

Using Airflow to orchestrate geospatial data pipelines unlocks powerful efficiencies for data teams. The combination of scalable processing and visual observability streamlines workflows, reduces costs and improves iteration speed. In this episode, Alex Iannicelli, Staff Software Engineer at Overture Maps Foundation, and Daniel Smith, Senior Soluti…

1
Block Bad Data Before the Write with Nike’s Ashok Singamaneni 20:20

2M ago20:20

20:20

저자 The Firebolt Data Bros

1
The Data Model That Captures Your Business: Metric Trees Explained 1:01:05

2M ago1:01:05

1:01:05

Summary In this episode of the Data Engineering Podcast Vijay Subramanian, founder and CEO of Trace, talks about metric trees - a new approach to data modeling that directly captures a company's business model. Vijay shares insights from his decade-long experience building data practices at Rent the Runway and explains how the modern data stack has…

1
Scaling Airflow for Enterprise Data Platforms at PepsiCo with Kunal Bhattacharya 19:04

2M ago19:04

19:04

PepsiCo’s data platform drives insights across finance, marketing and data science. Delivering stability, scalability and developer delight is central to its success, and engineering leadership plays a key role in making this possible. In this episode, Kunal Bhattacharya, Senior Manager of Data Platform Engineering at PepsiCo, shares how his team m…

1
From GPUs-as-a-Service to Workloads-as-a-Service: Flex AI’s Path to High-Utilization AI Infra 56:31

2M ago56:31

56:31

Summary In this crossover episode of the AI Engineering Podcast, host Tobias Macey interviews Brijesh Tripathi, CEO of Flex AI, about revolutionizing AI engineering by removing DevOps burdens through "workload as a service". Brijesh shares his expertise from leading AI/HPC architecture at Intel and deploying supercomputers like Aurora, highlighting…

1
Building a Unified Data Platform at Pattern with William Graham 24:09

2M ago24:09

24:09

The orchestration of data workflows at scale requires both flexibility and security. At Pattern, decoupling scheduling from orchestration has reshaped how data teams manage large-scale pipelines. In this episode, we are joined by William Graham, Senior Data Engineer at Pattern, who explains how his team leverages Apache Airflow alongside their open…

1
How Astronomer Turns Proactive Monitoring Into Customer Success with Collin McNulty 25:34

3M ago25:34

25:34

The evolution of Airflow continues to shape data orchestration and monitoring strategies. Leveraging it beyond traditional ETL use cases opens powerful new possibilities for proactive support and internal operations. In this episode, we are joined by Collin McNulty, Sr. Director of Global Support at Astronomer, who shares insights from his journey …

1
From RAG to Relational: How Agentic Patterns Are Reshaping Data Architecture 52:58

3M ago52:58

52:58

Summary In this episode of the AI Engineering Podcast Mark Brooker, VP and Distinguished Engineer at AWS, talks about how agentic workflows are transforming database usage and infrastructure design. He discusses the evolving role of data in AI systems, from traditional models to more modern approaches like vectors, RAG, and relational databases. Ma…

1
Postgres vs. Elasticsearch: The Unexpected Winner in High-Stakes Search for Instacart with Ankit Mittal 21:38

3M ago21:38

21:38

In this episode of The Data Engineering Show, Benjamin Wagner sits down with Ankit Mittal, former Senior Engineer at Instacart, to explore how they revolutionized their search infrastructure by transitioning from Elasticsearch to PostgreSQL. Learn how Instacart tackled the unique challenges of fast-moving grocery inventory, achieved high-performanc…

1
Overcoming Data Engineering Challenges at Daiichi Sankyo Europe GmbH with Evgenii Prusov 19:26

3M ago19:26

19:26

The shift to a unified data platform is reshaping how pharmaceutical companies manage and orchestrate data. Establishing standards across regions and teams ensures scalability and efficiency in handling large-scale analytics. In this episode, Evgenii Prusov, Senior Data Platform Engineer of Daiichi Sankyo Europe GmbH, joins us to discuss building a…

1
Duck Lake: Simplifying the Lakehouse Ecosystem 1:10:41

3M ago1:10:41

1:10:41

Summary In this episode of the Data Engineering Podcast Hannes Mühleisen and Mark Raasveldt, the creators of DuckDB, share their work on Duck Lake, a new entrant in the open lakehouse ecosystem. They discuss how Duck Lake, is focused on simplicity, flexibility, and offers a unified catalog and table format compared to other lakehouse formats like I…

1
Building a Data-Driven Beauty and Wellness Marketplace at StyleSeat with Paschal Onuorah 23:05

3M ago23:05

23:05

StyleSeat is revolutionizing how beauty and wellness professionals grow their businesses through data-driven tools. From streamlining scheduling to optimizing marketing, their platform empowers professionals to focus on their craft while expanding their client base. In this episode, Paschal Onuorah, Senior Data Engineer at StyleSeat, shares how the…

1
Aligning Business and Data: The Essential Role of Data Modeling 1:06:51

3M ago1:06:51

1:06:51

Summary In this episode of the Data Engineering Podcast Serge Gershkovich, head of product at SQL DBM, talks about the socio-technical aspects of data modeling. Serge shares his background in data modeling and highlights its importance as a collaborative process between business stakeholders and data teams. He debunks common misconceptions that dat…

1
Is Self-Service BI a False Promise? Lei Tang of Fabi.ai Thinks So 21:07

3M ago21:07

21:07

Explore the future of AI-powered business intelligence with Lei Tang, CTO and Co-founder of Fabi.ai, as he discusses the evolution from traditional self-service BI to "Vibe-analytics." Learn how AI is transforming data accessibility, enabling anyone to perform sophisticated analytics without deep technical expertise. From building trust in AI-gener…

1
Building the Future of Airflow Execution at Astronomer with Ian Buss and Piotr Chomiak 22:25

3M ago22:25

22:25

The evolution of orchestration in Airflow continues with innovations that address both scalability and security. From improving executor reliability to enabling remote execution, these advancements reshape how organizations manage data pipelines. In this episode, we’re joined by Ian Buss, Principal Software Engineer at Astronomer, and Piotr Chomiak…

1
From Academia to Industry: Bridging Data Engineering Challenges 50:54

3M ago50:54

50:54

Summary In this episode of the Data Engineering Podcast Professor Paul Groth, from the University of Amsterdam, talks about his research on knowledge graphs and data engineering. Paul shares his background in AI and data management, discussing the evolution of data provenance and lineage, as well as the challenges of data integration. He explores t…

1
Scaling On-Prem Airflow With 2,000 DAGs at Numberly with Sébastien Crocquevieille 24:17

3M ago24:17

24:17

Scaling 2,000+ data pipelines isn’t easy. But with the right tools and a self-hosted mindset, it becomes achievable. In this episode, Sébastien Crocquevieille, Data Engineer at Numberly, unpacks how the team scaled their on-prem Airflow setup using open-source tooling and Kubernetes. We explore orchestration strategies, UI-driven stakeholder access…

1
High Performance And Low Overhead Graphs With KuzuDB 1:01:29

4M ago1:01:29

1:01:29

Summary In this episode of the Data Engineering Podcast Prashanth Rao, an AI engineer at KuzuDB, talks about their embeddable graph database. Prashanth explains how KuzuDB addresses performance shortcomings in existing solutions through columnar storage and novel join algorithms. He discusses the usability and scalability of KuzuDB, emphasizing its…

들어볼 가치가 있는 팟캐스트

Data Engineering 팟 캐스트

들어볼 가치가 있는 팟캐스트

빠른 참조 가이드