Artwork

Michael Kennedy and Brian Okken에서 제공하는 콘텐츠입니다. 에피소드, 그래픽, 팟캐스트 설명을 포함한 모든 팟캐스트 콘텐츠는 Michael Kennedy and Brian Okken 또는 해당 팟캐스트 플랫폼 파트너가 직접 업로드하고 제공합니다. 누군가가 귀하의 허락 없이 귀하의 저작물을 사용하고 있다고 생각되는 경우 여기에 설명된 절차를 따르실 수 있습니다 https://ko.player.fm/legal.
Player FM -팟 캐스트 앱
Player FM 앱으로 오프라인으로 전환하세요!

#449 Suggestive Trove Classifiers

31:29
 
공유
 

Manage episode 506465053 series 1305988
Michael Kennedy and Brian Okken에서 제공하는 콘텐츠입니다. 에피소드, 그래픽, 팟캐스트 설명을 포함한 모든 팟캐스트 콘텐츠는 Michael Kennedy and Brian Okken 또는 해당 팟캐스트 플랫폼 파트너가 직접 업로드하고 제공합니다. 누군가가 귀하의 허락 없이 귀하의 저작물을 사용하고 있다고 생각되는 경우 여기에 설명된 절차를 따르실 수 있습니다 https://ko.player.fm/legal.
Topics covered in this episode:
Watch on YouTube
About the show

Sponsored by us! Support our work through:

Connect with the hosts

Join us on YouTube at pythonbytes.fm/live to be part of the audience. Usually Monday at 10am PT. Older video versions available there too.

Finally, if you want an artisanal, hand-crafted digest of every week of the show notes in email form? Add your name and email to our friends of the show list, we'll never share it.

Michael #1: Mozilla’s Lifeline is Safe After Judge’s Google Antitrust Ruling

  • A judge lets Google keep paying Mozilla to make Google the default search engine but only if those deals aren’t exclusive.
  • More than 85% of Mozilla’s revenue comes from Google search payments.
  • The ruling forbids Google from making exclusive contracts for Search, Chrome, Google Assistant, or Gemini, and forces data sharing and search syndication so rivals get a fighting chance.

Brian #2: troml - suggests or fills in trove classifiers for your projects

  • Adam Hill
  • This is super cool and so welcome.
  • Trove Classifiers are things like Programming Language :: Python :: 3.14 that allow for some fun stuff to show up in PyPI, like the versions you support, etc.
  • Note that just saying you require 3.9+ doesn’t tell the user that you’ve actually tested stuff on 3.14. I like to keep Trove Classifiers around for this reason.
  • Also, License classifier is deprecated, and if you include it, it shows up in two places, in Meta, and in the Classifiers section. Probably good to only have one place. So I’m going to be removing it from classifiers for my projects.
  • One problem, classifier text has to be an exact match to something in the classifier list, so we usually recommend copy/pasting from that list.
  • But no longer! Just use troml!
  • It just fills it in for you (if you run troml suggest --fix). How totally awesome is that!
  • I tried it on pytest-check, and it was mostly right. It suggested me adding 3.15, which I haven’t tested yet, so I’m not ready to add that just yet. :)
  • BTW, I talked with Brett Cannon about classifiers back in ‘23 if you want some more in depth info on trove classifiers.

Michael #3: pqrs: Command line tool for inspecting Parquet files

  • pqrs is a command line tool for inspecting Parquet files
  • This is a replacement for the parquet-tools utility written in Rust
  • Built using the Rust implementation of Parquet and Arrow
  • pqrs roughly means "parquet-tools in rust"
  • Why Parquet?
    • Size
      • A 200 MB CSV will usually shrink to somewhere between about 20-100 MB as Parquet depending on the data and compression. Loading a Parquet file is typically several times faster than parsing CSV, often 2x-10x faster for a full-file load and much faster when you only read some columns.
    • Speed
      • Full-file load into pandas: Parquet with pyarrow/fastparquet is usually 2x–10x faster than reading CSV with pandas because CSV parsing is CPU intensive (text tokenizing, dtype inference).
        • Example: if read_csv is 10 seconds, read_parquet might be ~1–5 seconds depending on CPU and codec.
      • Column subset: Parquet is much faster if you only need some columns — often 5x–50x faster because it reads only those column chunks.
      • Predicate pushdown & row groups: When using dataset APIs (pyarrow.dataset) you can push filters to skip row groups, reducing I/O dramatically for selective queries.
      • Memory usage: Parquet avoids temporary string buffers and repeated parsing, so peak memory and temporary allocations are often lower.

Brian #4: Testing for Python 3.14

  • Python 3.14 is just around the corner, with a final release scheduled for October.
  • What’s new in Python 3.14
  • Python 3.14 release schedule
  • Adding 3.14 to your CI tests in GitHub Actions
    • Add “3.14” and optionally “3.14t” for freethreaded
    • Add the line allow-prereleases: true
  • I got stuck on this, and asked folks on Mastdon and Bluesky
  • A couple folks suggested the allow-prereleases: true step. Thank you!
  • Ed Rogers also suggested Hugo’s article Free-threaded Python on GitHub Actions, which I had read and forgot about. Thanks Ed! And thanks Hugo!

Extras

Brian:

Michael:

Joke: Console Devs Can’t Find a Date

  continue reading

456 에피소드

Artwork

#449 Suggestive Trove Classifiers

Python Bytes

1,331 subscribers

published

icon공유
 
Manage episode 506465053 series 1305988
Michael Kennedy and Brian Okken에서 제공하는 콘텐츠입니다. 에피소드, 그래픽, 팟캐스트 설명을 포함한 모든 팟캐스트 콘텐츠는 Michael Kennedy and Brian Okken 또는 해당 팟캐스트 플랫폼 파트너가 직접 업로드하고 제공합니다. 누군가가 귀하의 허락 없이 귀하의 저작물을 사용하고 있다고 생각되는 경우 여기에 설명된 절차를 따르실 수 있습니다 https://ko.player.fm/legal.
Topics covered in this episode:
Watch on YouTube
About the show

Sponsored by us! Support our work through:

Connect with the hosts

Join us on YouTube at pythonbytes.fm/live to be part of the audience. Usually Monday at 10am PT. Older video versions available there too.

Finally, if you want an artisanal, hand-crafted digest of every week of the show notes in email form? Add your name and email to our friends of the show list, we'll never share it.

Michael #1: Mozilla’s Lifeline is Safe After Judge’s Google Antitrust Ruling

  • A judge lets Google keep paying Mozilla to make Google the default search engine but only if those deals aren’t exclusive.
  • More than 85% of Mozilla’s revenue comes from Google search payments.
  • The ruling forbids Google from making exclusive contracts for Search, Chrome, Google Assistant, or Gemini, and forces data sharing and search syndication so rivals get a fighting chance.

Brian #2: troml - suggests or fills in trove classifiers for your projects

  • Adam Hill
  • This is super cool and so welcome.
  • Trove Classifiers are things like Programming Language :: Python :: 3.14 that allow for some fun stuff to show up in PyPI, like the versions you support, etc.
  • Note that just saying you require 3.9+ doesn’t tell the user that you’ve actually tested stuff on 3.14. I like to keep Trove Classifiers around for this reason.
  • Also, License classifier is deprecated, and if you include it, it shows up in two places, in Meta, and in the Classifiers section. Probably good to only have one place. So I’m going to be removing it from classifiers for my projects.
  • One problem, classifier text has to be an exact match to something in the classifier list, so we usually recommend copy/pasting from that list.
  • But no longer! Just use troml!
  • It just fills it in for you (if you run troml suggest --fix). How totally awesome is that!
  • I tried it on pytest-check, and it was mostly right. It suggested me adding 3.15, which I haven’t tested yet, so I’m not ready to add that just yet. :)
  • BTW, I talked with Brett Cannon about classifiers back in ‘23 if you want some more in depth info on trove classifiers.

Michael #3: pqrs: Command line tool for inspecting Parquet files

  • pqrs is a command line tool for inspecting Parquet files
  • This is a replacement for the parquet-tools utility written in Rust
  • Built using the Rust implementation of Parquet and Arrow
  • pqrs roughly means "parquet-tools in rust"
  • Why Parquet?
    • Size
      • A 200 MB CSV will usually shrink to somewhere between about 20-100 MB as Parquet depending on the data and compression. Loading a Parquet file is typically several times faster than parsing CSV, often 2x-10x faster for a full-file load and much faster when you only read some columns.
    • Speed
      • Full-file load into pandas: Parquet with pyarrow/fastparquet is usually 2x–10x faster than reading CSV with pandas because CSV parsing is CPU intensive (text tokenizing, dtype inference).
        • Example: if read_csv is 10 seconds, read_parquet might be ~1–5 seconds depending on CPU and codec.
      • Column subset: Parquet is much faster if you only need some columns — often 5x–50x faster because it reads only those column chunks.
      • Predicate pushdown & row groups: When using dataset APIs (pyarrow.dataset) you can push filters to skip row groups, reducing I/O dramatically for selective queries.
      • Memory usage: Parquet avoids temporary string buffers and repeated parsing, so peak memory and temporary allocations are often lower.

Brian #4: Testing for Python 3.14

  • Python 3.14 is just around the corner, with a final release scheduled for October.
  • What’s new in Python 3.14
  • Python 3.14 release schedule
  • Adding 3.14 to your CI tests in GitHub Actions
    • Add “3.14” and optionally “3.14t” for freethreaded
    • Add the line allow-prereleases: true
  • I got stuck on this, and asked folks on Mastdon and Bluesky
  • A couple folks suggested the allow-prereleases: true step. Thank you!
  • Ed Rogers also suggested Hugo’s article Free-threaded Python on GitHub Actions, which I had read and forgot about. Thanks Ed! And thanks Hugo!

Extras

Brian:

Michael:

Joke: Console Devs Can’t Find a Date

  continue reading

456 에피소드

Alle Folgen

×
 
Loading …

플레이어 FM에 오신것을 환영합니다!

플레이어 FM은 웹에서 고품질 팟캐스트를 검색하여 지금 바로 즐길 수 있도록 합니다. 최고의 팟캐스트 앱이며 Android, iPhone 및 웹에서도 작동합니다. 장치 간 구독 동기화를 위해 가입하세요.

 

빠른 참조 가이드

탐색하는 동안 이 프로그램을 들어보세요.
재생