Artwork

Tobias Macey에서 제공하는 콘텐츠입니다. 에피소드, 그래픽, 팟캐스트 설명을 포함한 모든 팟캐스트 콘텐츠는 Tobias Macey 또는 해당 팟캐스트 플랫폼 파트너가 직접 업로드하고 제공합니다. 누군가가 귀하의 허락 없이 귀하의 저작물을 사용하고 있다고 생각되는 경우 여기에 설명된 절차를 따르실 수 있습니다 https://ko.player.fm/legal.
Player FM -팟 캐스트 앱
Player FM 앱으로 오프라인으로 전환하세요!

Build A Data Lake For Your Security Logs With Scanner

1:02:38
 
공유
 

Manage episode 398113046 series 3449056
Tobias Macey에서 제공하는 콘텐츠입니다. 에피소드, 그래픽, 팟캐스트 설명을 포함한 모든 팟캐스트 콘텐츠는 Tobias Macey 또는 해당 팟캐스트 플랫폼 파트너가 직접 업로드하고 제공합니다. 누군가가 귀하의 허락 없이 귀하의 저작물을 사용하고 있다고 생각되는 경우 여기에 설명된 절차를 따르실 수 있습니다 https://ko.player.fm/legal.

Summary

Monitoring and auditing IT systems for security events requires the ability to quickly analyze massive volumes of unstructured log data. The majority of products that are available either require too much effort to structure the logs, or aren't fast enough for interactive use cases. Cliff Crosland co-founded Scanner to provide fast querying of high scale log data for security auditing. In this episode he shares the story of how it got started, how it works, and how you can get started with it.

Announcements

  • Hello and welcome to the Data Engineering Podcast, the show about modern data management
  • Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino.
  • Your host is Tobias Macey and today I'm interviewing Cliff Crosland about Scanner, a security data lake platform for analyzing security logs and identifying issues quickly and cost-effectively

Interview

  • Introduction
  • How did you get involved in the area of data management?
  • Can you describe what Scanner is and the story behind it?
    • What were the shortcomings of other tools that are available in the ecosystem?
  • What is Scanner explicitly not trying to solve for in the security space? (e.g. SIEM)
  • A query engine is useless without data to analyze. What are the data acquisition paths/sources that you are designed to work with?- e.g. cloudtrail logs, app logs, etc.
    • What are some of the other sources of signal for security monitoring that would be valuable to incorporate or integrate with through Scanner?
  • Log data is notoriously messy, with no strictly defined format. How do you handle introspection and querying across loosely structured records that might span multiple sources and inconsistent labelling strategies?
  • Can you describe the architecture of the Scanner platform?
    • What were the motivating constraints that led you to your current implementation?
    • How have the design and goals of the product changed since you first started working on it?
  • Given the security oriented customer base that you are targeting, how do you address trust/network boundaries for compliance with regulatory/organizational policies?
  • What are the personas of the end-users for Scanner?
    • How has that influenced the way that you think about the query formats, APIs, user experience etc. for the prroduct?
  • For teams who are working with Scanner can you describe how it fits into their workflow?
  • What are the most interesting, innovative, or unexpected ways that you have seen Scanner used?
  • What are the most interesting, unexpected, or challenging lessons that you have learned while working on Scanner?
  • When is Scanner the wrong choice?
  • What do you have planned for the future of Scanner?

Contact Info

Parting Question

  • From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

  • Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning.
  • Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
  • If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com) with your story.

Links

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Sponsored By:

Support Data Engineering Podcast

  continue reading

427 에피소드

Artwork
icon공유
 
Manage episode 398113046 series 3449056
Tobias Macey에서 제공하는 콘텐츠입니다. 에피소드, 그래픽, 팟캐스트 설명을 포함한 모든 팟캐스트 콘텐츠는 Tobias Macey 또는 해당 팟캐스트 플랫폼 파트너가 직접 업로드하고 제공합니다. 누군가가 귀하의 허락 없이 귀하의 저작물을 사용하고 있다고 생각되는 경우 여기에 설명된 절차를 따르실 수 있습니다 https://ko.player.fm/legal.

Summary

Monitoring and auditing IT systems for security events requires the ability to quickly analyze massive volumes of unstructured log data. The majority of products that are available either require too much effort to structure the logs, or aren't fast enough for interactive use cases. Cliff Crosland co-founded Scanner to provide fast querying of high scale log data for security auditing. In this episode he shares the story of how it got started, how it works, and how you can get started with it.

Announcements

  • Hello and welcome to the Data Engineering Podcast, the show about modern data management
  • Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino.
  • Your host is Tobias Macey and today I'm interviewing Cliff Crosland about Scanner, a security data lake platform for analyzing security logs and identifying issues quickly and cost-effectively

Interview

  • Introduction
  • How did you get involved in the area of data management?
  • Can you describe what Scanner is and the story behind it?
    • What were the shortcomings of other tools that are available in the ecosystem?
  • What is Scanner explicitly not trying to solve for in the security space? (e.g. SIEM)
  • A query engine is useless without data to analyze. What are the data acquisition paths/sources that you are designed to work with?- e.g. cloudtrail logs, app logs, etc.
    • What are some of the other sources of signal for security monitoring that would be valuable to incorporate or integrate with through Scanner?
  • Log data is notoriously messy, with no strictly defined format. How do you handle introspection and querying across loosely structured records that might span multiple sources and inconsistent labelling strategies?
  • Can you describe the architecture of the Scanner platform?
    • What were the motivating constraints that led you to your current implementation?
    • How have the design and goals of the product changed since you first started working on it?
  • Given the security oriented customer base that you are targeting, how do you address trust/network boundaries for compliance with regulatory/organizational policies?
  • What are the personas of the end-users for Scanner?
    • How has that influenced the way that you think about the query formats, APIs, user experience etc. for the prroduct?
  • For teams who are working with Scanner can you describe how it fits into their workflow?
  • What are the most interesting, innovative, or unexpected ways that you have seen Scanner used?
  • What are the most interesting, unexpected, or challenging lessons that you have learned while working on Scanner?
  • When is Scanner the wrong choice?
  • What do you have planned for the future of Scanner?

Contact Info

Parting Question

  • From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

  • Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning.
  • Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
  • If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com) with your story.

Links

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Sponsored By:

Support Data Engineering Podcast

  continue reading

427 에피소드

Tous les épisodes

×
 
Loading …

플레이어 FM에 오신것을 환영합니다!

플레이어 FM은 웹에서 고품질 팟캐스트를 검색하여 지금 바로 즐길 수 있도록 합니다. 최고의 팟캐스트 앱이며 Android, iPhone 및 웹에서도 작동합니다. 장치 간 구독 동기화를 위해 가입하세요.

 

빠른 참조 가이드