Multi-modal Deep Learning for Complex Document Understanding with Doug Burdick - #541

45:32
 
공유
 

Manage episode 308629221 series 2355587
Player FM과 저희 커뮤니티의 TWIML and Sam Charrington 콘텐츠는 모두 원 저작자에게 속하며 Player FM이 아닌 작가가 저작권을 갖습니다. 오디오는 해당 서버에서 직접 스트리밍 됩니다. 구독 버튼을 눌러 Player FM에서 업데이트 현황을 확인하세요. 혹은 다른 팟캐스트 앱에서 URL을 불러오세요.

Today we’re joined by Doug Burdick, a principal research staff member at IBM Research. In a recent interview, Doug’s colleague Yunyao Li joined us to talk through some of the broader enterprise NLP problems she’s working on. One of those problems is making documents machine consumable, especially with the traditionally archival file type, the PDF. That’s where Doug and his team come in.

In our conversation, we discuss the multimodal approach they’ve taken to identify, interpret, contextualize and extract things like tables from a document, the challenges they’ve faced when dealing with the tables and how they evaluate the performance of models on tables. We also explore how he’s handled generalizing across different formats, how fine-tuning has to be in order to be effective, the problems that appear on the NLP side of things, and how deep learning models are being leveraged within the group.

The complete show notes for this episode can be found at twimlai.com/go/541

599 에피소드