Artwork

Christian Krug에서 제공하는 콘텐츠입니다. 에피소드, 그래픽, 팟캐스트 설명을 포함한 모든 팟캐스트 콘텐츠는 Christian Krug 또는 해당 팟캐스트 플랫폼 파트너가 직접 업로드하고 제공합니다. 누군가가 귀하의 허락 없이 귀하의 저작물을 사용하고 있다고 생각되는 경우 여기에 설명된 절차를 따르실 수 있습니다 https://ko.player.fm/legal.
Player FM -팟 캐스트 앱
Player FM 앱으로 오프라인으로 전환하세요!

Fixing Dirty Data - How to clean Data by classification and Normalization | Susan Walsh

40:02
 
공유
 

Manage episode 442876197 series 3556338
Christian Krug에서 제공하는 콘텐츠입니다. 에피소드, 그래픽, 팟캐스트 설명을 포함한 모든 팟캐스트 콘텐츠는 Christian Krug 또는 해당 팟캐스트 플랫폼 파트너가 직접 업로드하고 제공합니다. 누군가가 귀하의 허락 없이 귀하의 저작물을 사용하고 있다고 생각되는 경우 여기에 설명된 절차를 따르실 수 있습니다 https://ko.player.fm/legal.

In the first ever English Episode of UNF#CK YOUR DATA host Christian Krug interviews Susan Walsh, the classification guru, on how to clean your dirty data.

But firstly, what is dirty data and why does this pose a problem?

Data in your company systems, like CRM or ERP, can have all sorts of issues. Duplicates, near duplicates, formats and so on.

So the records which should match, don’t. Or your numbers are off.

Basically, you can’t rely on the data in the system to make decisions. Like sending a mail or a leaflet. Potentially even an invoice. Or know who your real number one customer is.

To help you deal with this mess, Susan has created a framework, which helps you cleaning up your data. You have to normalize and classify your data. First agree on a common format an fit the data to it. Afterwards you can give the data a meaning by classifying it.

So you can further process the data and base your decisions on it.

Sad news for all the AI enthusiasts out there: This still requires an awful lot of human knowledge. No speeding up the process.

On the other hand this step is crucial for your AI success. As only good quality training data will lead to great AI results. Regardless, which use case you tackle first.

But cleaning data one is not a lasting solution. It’s a continuous effort and it hast to start at the very source where people enter the data into the systems.

So data quality is a process and mantra.

Find in this episode:

- Why data sometimes is so dirty

- How a COAT method can help you clean data

- Why data quality is not an AI topic

- Susans plans on a new framework

▬▬▬▬▬▬ Profiles: ▬▬▬▬

Zum LinkedIn-Profil von Susan: https://www.linkedin.com/in/susanewalsh/

Christian at LinkedIn: https://www.linkedin.com/in/christian-krug/

Unf*ck Your Data at Linkedin: https://www.linkedin.com/company/unfck-your-data

▬▬▬▬▬▬ Book recommendation: ▬▬▬▬

Susans book recommendation: Buy back your time - Dan Martell

The “UYD” bookshelf at Melena’s store: https://gunzenhausen.buchhandlung.de/unfuckyourdata

▬▬▬▬▬▬ Where to find UN#CK YOUR DATA: ▬▬▬▬

Podcast at Spotify: https://open.spotify.com/show/6Ow7ySMbgnir27etMYkpxT?si=dc0fd2b3c6454bfa

Podcast at iTunes: https://podcasts.apple.com/de/podcast/unf-ck-your-data/id1673832019

Podcast at Deezer: https://deezer.page.link/FnT5kRSjf2k54iib6

▬▬▬▬▬▬ Contact: ▬▬▬▬

E-Mail: christian@uyd-podcast.com

▬▬▬▬▬▬ Timestamps: ▬▬▬▬▬▬▬▬▬▬▬▬▬

00:00 Introduction and Welcome

01:13 Susan's Background and Expertise

03:03 Types of Dirty Data

04:01 The Impact of Dirty Data

06:12 Cleaning Data and the Role of Excel

07:34 The Limitations of AI in Data Cleaning

09:26 Automating Supplier Name Normalization

11:03 Data Classification and Context

13:52 The Importance of Business Understanding

16:26 The Role of Human Expertise in Data Work

19:32 Data Normalization and Classification

22:33 The Importance of Clean and Organized Data

27:19 The 'Data Coat' Methodology

31:26 The Value of Humor in Business

33:53 Book Recommendation: 'Buy Back Your Time'

  continue reading

105 에피소드

Artwork
icon공유
 
Manage episode 442876197 series 3556338
Christian Krug에서 제공하는 콘텐츠입니다. 에피소드, 그래픽, 팟캐스트 설명을 포함한 모든 팟캐스트 콘텐츠는 Christian Krug 또는 해당 팟캐스트 플랫폼 파트너가 직접 업로드하고 제공합니다. 누군가가 귀하의 허락 없이 귀하의 저작물을 사용하고 있다고 생각되는 경우 여기에 설명된 절차를 따르실 수 있습니다 https://ko.player.fm/legal.

In the first ever English Episode of UNF#CK YOUR DATA host Christian Krug interviews Susan Walsh, the classification guru, on how to clean your dirty data.

But firstly, what is dirty data and why does this pose a problem?

Data in your company systems, like CRM or ERP, can have all sorts of issues. Duplicates, near duplicates, formats and so on.

So the records which should match, don’t. Or your numbers are off.

Basically, you can’t rely on the data in the system to make decisions. Like sending a mail or a leaflet. Potentially even an invoice. Or know who your real number one customer is.

To help you deal with this mess, Susan has created a framework, which helps you cleaning up your data. You have to normalize and classify your data. First agree on a common format an fit the data to it. Afterwards you can give the data a meaning by classifying it.

So you can further process the data and base your decisions on it.

Sad news for all the AI enthusiasts out there: This still requires an awful lot of human knowledge. No speeding up the process.

On the other hand this step is crucial for your AI success. As only good quality training data will lead to great AI results. Regardless, which use case you tackle first.

But cleaning data one is not a lasting solution. It’s a continuous effort and it hast to start at the very source where people enter the data into the systems.

So data quality is a process and mantra.

Find in this episode:

- Why data sometimes is so dirty

- How a COAT method can help you clean data

- Why data quality is not an AI topic

- Susans plans on a new framework

▬▬▬▬▬▬ Profiles: ▬▬▬▬

Zum LinkedIn-Profil von Susan: https://www.linkedin.com/in/susanewalsh/

Christian at LinkedIn: https://www.linkedin.com/in/christian-krug/

Unf*ck Your Data at Linkedin: https://www.linkedin.com/company/unfck-your-data

▬▬▬▬▬▬ Book recommendation: ▬▬▬▬

Susans book recommendation: Buy back your time - Dan Martell

The “UYD” bookshelf at Melena’s store: https://gunzenhausen.buchhandlung.de/unfuckyourdata

▬▬▬▬▬▬ Where to find UN#CK YOUR DATA: ▬▬▬▬

Podcast at Spotify: https://open.spotify.com/show/6Ow7ySMbgnir27etMYkpxT?si=dc0fd2b3c6454bfa

Podcast at iTunes: https://podcasts.apple.com/de/podcast/unf-ck-your-data/id1673832019

Podcast at Deezer: https://deezer.page.link/FnT5kRSjf2k54iib6

▬▬▬▬▬▬ Contact: ▬▬▬▬

E-Mail: christian@uyd-podcast.com

▬▬▬▬▬▬ Timestamps: ▬▬▬▬▬▬▬▬▬▬▬▬▬

00:00 Introduction and Welcome

01:13 Susan's Background and Expertise

03:03 Types of Dirty Data

04:01 The Impact of Dirty Data

06:12 Cleaning Data and the Role of Excel

07:34 The Limitations of AI in Data Cleaning

09:26 Automating Supplier Name Normalization

11:03 Data Classification and Context

13:52 The Importance of Business Understanding

16:26 The Role of Human Expertise in Data Work

19:32 Data Normalization and Classification

22:33 The Importance of Clean and Organized Data

27:19 The 'Data Coat' Methodology

31:26 The Value of Humor in Business

33:53 Book Recommendation: 'Buy Back Your Time'

  continue reading

105 에피소드

모든 에피소드

×
 
Loading …

플레이어 FM에 오신것을 환영합니다!

플레이어 FM은 웹에서 고품질 팟캐스트를 검색하여 지금 바로 즐길 수 있도록 합니다. 최고의 팟캐스트 앱이며 Android, iPhone 및 웹에서도 작동합니다. 장치 간 구독 동기화를 위해 가입하세요.

 

빠른 참조 가이드

탐색하는 동안 이 프로그램을 들어보세요.
재생