Artwork

O'Reilly Media에서 제공하는 콘텐츠입니다. 에피소드, 그래픽, 팟캐스트 설명을 포함한 모든 팟캐스트 콘텐츠는 O'Reilly Media 또는 해당 팟캐스트 플랫폼 파트너가 직접 업로드하고 제공합니다. 누군가가 귀하의 허락 없이 귀하의 저작물을 사용하고 있다고 생각되는 경우 여기에 설명된 절차를 따르실 수 있습니다 https://ko.player.fm/legal.
Player FM -팟 캐스트 앱
Player FM 앱으로 오프라인으로 전환하세요!

Labeling, transforming, and structuring training data sets for machine learning

40:51
 
공유
 

Manage episode 248276917 series 1652310
O'Reilly Media에서 제공하는 콘텐츠입니다. 에피소드, 그래픽, 팟캐스트 설명을 포함한 모든 팟캐스트 콘텐츠는 O'Reilly Media 또는 해당 팟캐스트 플랫폼 파트너가 직접 업로드하고 제공합니다. 누군가가 귀하의 허락 없이 귀하의 저작물을 사용하고 있다고 생각되는 경우 여기에 설명된 절차를 따르실 수 있습니다 https://ko.player.fm/legal.

In this episode of the Data Show, I speak with Alex Ratner, project lead for Stanford’s Snorkel open source project; Ratner also recently garnered a faculty position at the University of Washington and is currently working on a company supporting and extending the Snorkel project. Snorkel is a framework for building and managing training data. Based on our survey from earlier this year, labeled data remains a key bottleneck for organizations building machine learning applications and services.

Ratner was a guest on the podcast a little over two years ago when Snorkel was a relatively new project. Since then, Snorkel has added more features, expanded into computer vision use cases, and now boasts many users, including Google, Intel, IBM, and other organizations. Along with his thesis advisor professor Chris Ré of Stanford, Ratner and his collaborators have long championed the importance of building tools aimed squarely at helping teams build and manage training data. With today’s release of Snorkel version 0.9, we are a step closer to having a framework that enables the programmatic creation of training data sets.

Snorkel pipeline for data labeling
Snorkel pipeline for data labeling. Source: Alex Ratner, used with permission.

We had a great conversation spanning many topics, including:

  • Why he and his collaborators decided to focus on “data programming” and tools for building and managing training data.
  • A tour through Snorkel, including its target users and key components.
  • What’s in the newly released version (v 0.9) of Snorkel.
  • The number of Snorkel’s users has grown quite a bit since we last spoke, so we went through some of the common use cases for the project.
  • Data lineage, AutoML, and end-to-end automation of machine learning pipelines.
  • Holoclean and other projects focused on data quality and data programming.
  • The need for tools that can ease the transition from raw data to derived data (e.g., entities), insights, and even knowledge.

Related resources:

  continue reading

133 에피소드

Artwork
icon공유
 
Manage episode 248276917 series 1652310
O'Reilly Media에서 제공하는 콘텐츠입니다. 에피소드, 그래픽, 팟캐스트 설명을 포함한 모든 팟캐스트 콘텐츠는 O'Reilly Media 또는 해당 팟캐스트 플랫폼 파트너가 직접 업로드하고 제공합니다. 누군가가 귀하의 허락 없이 귀하의 저작물을 사용하고 있다고 생각되는 경우 여기에 설명된 절차를 따르실 수 있습니다 https://ko.player.fm/legal.

In this episode of the Data Show, I speak with Alex Ratner, project lead for Stanford’s Snorkel open source project; Ratner also recently garnered a faculty position at the University of Washington and is currently working on a company supporting and extending the Snorkel project. Snorkel is a framework for building and managing training data. Based on our survey from earlier this year, labeled data remains a key bottleneck for organizations building machine learning applications and services.

Ratner was a guest on the podcast a little over two years ago when Snorkel was a relatively new project. Since then, Snorkel has added more features, expanded into computer vision use cases, and now boasts many users, including Google, Intel, IBM, and other organizations. Along with his thesis advisor professor Chris Ré of Stanford, Ratner and his collaborators have long championed the importance of building tools aimed squarely at helping teams build and manage training data. With today’s release of Snorkel version 0.9, we are a step closer to having a framework that enables the programmatic creation of training data sets.

Snorkel pipeline for data labeling
Snorkel pipeline for data labeling. Source: Alex Ratner, used with permission.

We had a great conversation spanning many topics, including:

  • Why he and his collaborators decided to focus on “data programming” and tools for building and managing training data.
  • A tour through Snorkel, including its target users and key components.
  • What’s in the newly released version (v 0.9) of Snorkel.
  • The number of Snorkel’s users has grown quite a bit since we last spoke, so we went through some of the common use cases for the project.
  • Data lineage, AutoML, and end-to-end automation of machine learning pipelines.
  • Holoclean and other projects focused on data quality and data programming.
  • The need for tools that can ease the transition from raw data to derived data (e.g., entities), insights, and even knowledge.

Related resources:

  continue reading

133 에피소드

모든 에피소드

×
 
Loading …

플레이어 FM에 오신것을 환영합니다!

플레이어 FM은 웹에서 고품질 팟캐스트를 검색하여 지금 바로 즐길 수 있도록 합니다. 최고의 팟캐스트 앱이며 Android, iPhone 및 웹에서도 작동합니다. 장치 간 구독 동기화를 위해 가입하세요.

 

빠른 참조 가이드