Artwork

Henry Ng에서 제공하는 콘텐츠입니다. 에피소드, 그래픽, 팟캐스트 설명을 포함한 모든 팟캐스트 콘텐츠는 Henry Ng 또는 해당 팟캐스트 플랫폼 파트너가 직접 업로드하고 제공합니다. 누군가가 귀하의 허락 없이 귀하의 저작물을 사용하고 있다고 생각되는 경우 여기에 설명된 절차를 따르실 수 있습니다 https://ko.player.fm/legal.
Player FM -팟 캐스트 앱
Player FM 앱으로 오프라인으로 전환하세요!

Intricacies of web scraping in 2023 with Pierluigi VInciguerra, founder of The Web Scraping Club

28:29
 
공유
 

저장한 시리즈 ("피드 비활성화" status)

When? This feed was archived on September 29, 2024 21:08 (7d ago). Last successful fetch was on February 26, 2024 20:47 (7M ago)

Why? 피드 비활성화 status. 잠시 서버에 문제가 발생해 팟캐스트를 불러오지 못합니다.

What now? You might be able to find a more up-to-date version using the search function. This series will no longer be checked for updates. If you believe this to be in error, please check if the publisher's feed link below is valid and contact support to request the feed be restored or if you have any other concerns about this.

Manage episode 365822283 series 3427778
Henry Ng에서 제공하는 콘텐츠입니다. 에피소드, 그래픽, 팟캐스트 설명을 포함한 모든 팟캐스트 콘텐츠는 Henry Ng 또는 해당 팟캐스트 플랫폼 파트너가 직접 업로드하고 제공합니다. 누군가가 귀하의 허락 없이 귀하의 저작물을 사용하고 있다고 생각되는 경우 여기에 설명된 절차를 따르실 수 있습니다 https://ko.player.fm/legal.
In this episode of Ethical Data, Explained, Henry Ng is joined by Pierluigi Vinciguerra, founder of The Web Scraping Club as well as Co-founder and CTO of Re Analytics - Databoutique.com. Pier is a web scraping professional with more than 15 years of experience in data sourcing. We discussed web scraping's past, present, and future - how technology evolves and what to expect in the coming years, what trends are emerging and driving the market, and where is the future of web scraping for business - in-house or outsourced teams. We also talked about things like what determines the success of a web scraping project or how to choose a proxy provider for your project.
This episode is a great opportunity to learn more about the man behind the Web Scraping Club project and get his perspective on the industry and its future.

Quotes


1. “If we are talking about the success of a small web scraping project, the most important thing is the quality of the output. If you're selling this project you need to create trust between you as a provider and a user and you need to put all the effort you can to provide quality data. To do so you need to set up a process of data quality with the most common techniques like human count regression, trends forecasting, etc. For large-scale projects, this applies as well but you also need to think about your scraping architecture. If you're building something that you're going to scale you need to standardize your processes.”
2. “Web scraping is becoming harder and more expensive. 10 years ago there was no need to have any proxy unless you needed to by-pass a geo-fence of a website. Now you need much more tools - proxies, headless browsers... "
3. “Many in the industry try to sell their APIs for automatic extraction from websites. This is a trend I've seen started four or five years ago and I think it's a good trend for for the data sourcing industry because it resolves quite a number of issues."
4. "There is more attention to the sourcing of the IP from many proxy providers, the Narrative of the proxy provider about the proxy industries moved to the ethical sourcing of the IP. It's good for this industry because web scraping has always been seen as shady. But it's totally legit if you do it in a proper way."

3 questions we ask all guests:


1. Who in the world of Tech/Data Pier would take out for lunch?
2. What piece of software Pier couldn't imagine life without?
Scrapy - an open-source and collaborative framework for extracting data websites.

3. What real-life problem did Pier solve using data?
Wrote a scraper to help him buy a TV, which eventually saved him 300-400 EUR.

Episode Resources

If you enjoyed this episode then please either:

Subscribe, rate, and review the "Ethical Data, Explained" podcast on Apple Podcasts.
Follow the "Ethical Data, Explained" podcast on Spotify.
Follow the "Ethical Data, Explained" podcast on Google Podcasts.
Watch full episodes of the "Ethical Data, Explained" podcast on YouTube.
To know more about SOAX visit the website.

Ethical Data, Explained is handcrafted by our friends over at: fame.so
  continue reading

10 에피소드

Artwork
icon공유
 

저장한 시리즈 ("피드 비활성화" status)

When? This feed was archived on September 29, 2024 21:08 (7d ago). Last successful fetch was on February 26, 2024 20:47 (7M ago)

Why? 피드 비활성화 status. 잠시 서버에 문제가 발생해 팟캐스트를 불러오지 못합니다.

What now? You might be able to find a more up-to-date version using the search function. This series will no longer be checked for updates. If you believe this to be in error, please check if the publisher's feed link below is valid and contact support to request the feed be restored or if you have any other concerns about this.

Manage episode 365822283 series 3427778
Henry Ng에서 제공하는 콘텐츠입니다. 에피소드, 그래픽, 팟캐스트 설명을 포함한 모든 팟캐스트 콘텐츠는 Henry Ng 또는 해당 팟캐스트 플랫폼 파트너가 직접 업로드하고 제공합니다. 누군가가 귀하의 허락 없이 귀하의 저작물을 사용하고 있다고 생각되는 경우 여기에 설명된 절차를 따르실 수 있습니다 https://ko.player.fm/legal.
In this episode of Ethical Data, Explained, Henry Ng is joined by Pierluigi Vinciguerra, founder of The Web Scraping Club as well as Co-founder and CTO of Re Analytics - Databoutique.com. Pier is a web scraping professional with more than 15 years of experience in data sourcing. We discussed web scraping's past, present, and future - how technology evolves and what to expect in the coming years, what trends are emerging and driving the market, and where is the future of web scraping for business - in-house or outsourced teams. We also talked about things like what determines the success of a web scraping project or how to choose a proxy provider for your project.
This episode is a great opportunity to learn more about the man behind the Web Scraping Club project and get his perspective on the industry and its future.

Quotes


1. “If we are talking about the success of a small web scraping project, the most important thing is the quality of the output. If you're selling this project you need to create trust between you as a provider and a user and you need to put all the effort you can to provide quality data. To do so you need to set up a process of data quality with the most common techniques like human count regression, trends forecasting, etc. For large-scale projects, this applies as well but you also need to think about your scraping architecture. If you're building something that you're going to scale you need to standardize your processes.”
2. “Web scraping is becoming harder and more expensive. 10 years ago there was no need to have any proxy unless you needed to by-pass a geo-fence of a website. Now you need much more tools - proxies, headless browsers... "
3. “Many in the industry try to sell their APIs for automatic extraction from websites. This is a trend I've seen started four or five years ago and I think it's a good trend for for the data sourcing industry because it resolves quite a number of issues."
4. "There is more attention to the sourcing of the IP from many proxy providers, the Narrative of the proxy provider about the proxy industries moved to the ethical sourcing of the IP. It's good for this industry because web scraping has always been seen as shady. But it's totally legit if you do it in a proper way."

3 questions we ask all guests:


1. Who in the world of Tech/Data Pier would take out for lunch?
2. What piece of software Pier couldn't imagine life without?
Scrapy - an open-source and collaborative framework for extracting data websites.

3. What real-life problem did Pier solve using data?
Wrote a scraper to help him buy a TV, which eventually saved him 300-400 EUR.

Episode Resources

If you enjoyed this episode then please either:

Subscribe, rate, and review the "Ethical Data, Explained" podcast on Apple Podcasts.
Follow the "Ethical Data, Explained" podcast on Spotify.
Follow the "Ethical Data, Explained" podcast on Google Podcasts.
Watch full episodes of the "Ethical Data, Explained" podcast on YouTube.
To know more about SOAX visit the website.

Ethical Data, Explained is handcrafted by our friends over at: fame.so
  continue reading

10 에피소드

Minden epizód

×
 
Loading …

플레이어 FM에 오신것을 환영합니다!

플레이어 FM은 웹에서 고품질 팟캐스트를 검색하여 지금 바로 즐길 수 있도록 합니다. 최고의 팟캐스트 앱이며 Android, iPhone 및 웹에서도 작동합니다. 장치 간 구독 동기화를 위해 가입하세요.

 

빠른 참조 가이드