Data Engineering 공개
[search 0]

Download the App!

show episodes
 
Welcome to Data Brew by Databricks with Denny and Brooke! In this series, we explore various topics in the data and AI community and interview subject matter experts in data engineering/data science. So join us with your morning brew in hand and get ready to dive deep into data + AI! For this first season, we will be focusing on lakehouses – combining the key features of data warehouses, such as ACID transactions, with the scalability of data lakes, directly against low-cost object stores.
 
Data Driven: the podcast where we explore the emerging field of Data Science. We bring the best minds in Data, Software Engineering, Machine Learning, and Artificial Intelligence right to you every Tuesday. The field of data science mashes up the worlds of statistics, database architecture and software engineering. Data Scientist has been labelled by the Harvard Business Review, as "the sexiest job of the 21st century." A quick search of job search sites reveal that this field is in high dem ...
 
Leading women in data science share their work, advice, and lessons learned along the way with Professor Margot Gerritsen from Stanford University. Hear about how data science is being applied and having impact across a wide range of domains, from healthcare to finance to cosmology to human rights and more. This podcast is brought to you by the Stanford Institute for Computational & Mathematical Engineering (ICME) and Stanford Data Science. Generous support for this podcast and other Women i ...
 
Data. Is it all about technology, engineering or coding? We often don't see the immediate impact on other fields. But life sciences require data to advance. Join pioneers, entrepreneurs, investors and academics to discover the power and influence of data in medicine, genomics, biodiversity, marine sciences... When data meets know-how in the life sciences, we immerse ourselves in a different world – bioinformatics. Supported by ELIXIR Europe
 
Unexpected Data is the first austrian Data and Data Science centered Podcasting Service. Unexpected Data helps individuals and organizations to embed AI ethics by design in their digital journey. Founded and hosted by Yudan Lin, our podcasts and services have been devoted to starting taboo-free discussions and give more awareness about Data and Data Science in our daily life. By creating unexpected and inspirational content, Unexpected Data supports you in understanding what it takes to live ...
 
The Engineering IRL podcast is here to help improve problem solving skills for all people by breaking down engineering design concepts and applying them to real life. Engineers are the professional problem solvers and they solve both simple and complex problems. Use this engineering podcast to learn passively and gain some insights to problem solving ideas and techniques you can use in real life. https://www.engineeringinreallife.com Better Problem Solving. Better designs. Better Engineers.
 
REVISED: This course was updated during the fall, 2016 semester. New data structures includes left-leaning red-black trees and the Nguyen-Wong implementation of B-trees, both containing complete algorithms for insertion and removal. Catalog description Data structures and design patterns with the C++ language. Analysis of algorithms. Sorting algorithms—insertion sort, merge sort, heapsort, quicksort. Linear data structures—stacks, queues, linked lists. Dictionaries. Hash tables. Trees—binary ...
 
Loading …
show series
 
Summary Working with unstructured data has typically been a motivation for a data lake. The challenge is imposing enough order on the platform to make it useful. Kirk Marple has spent years working with data systems and the media industry, which inspired him to build a platform for automatically organizing your unstructured assets to make them more…
 
The company StreamSets is enabling DataOps practices in today’s enterprises. StreamSets is a data engineering platform designed to help engineers design, deploy, and operate smart data pipelines. StreamSets Data Collector is a codeless solution for designing pipelines, triggering CDC operations, and monitoring data in flight. StreamSets Transformer…
 
Você já parou pra pensar onde a análise de dados entra em famosos casos de corrupção no Brasil? Como que especialistas descobrem fraudes empresariais, como lavagem ou desvio de dinheiro? Que ferramentas eles usam? Eles são tipo o CSI? Convidamos pra esse papo Osvaldo Aranha, que é Head of Azure Data & AI na Microsoft e que tem mais de 10 anos de ex…
 
For our second season of Data Brew, we will be focusing on machine learning, from research to production. We will interview folks in academia and industry to discuss topics such as data ethics, production-grade infrastructure for ML, hyperparameter tuning, AutoML, and many more. Erin LeDell shares valuable insight on AutoML, what problems are best …
 
Delivering Saas products involves a lot more than just building the product. Saas management involves customer relationship management, licensing, renewals, maintaining software visibility, and the general management of the technology portfolio. The company Blissfully helps businesses manage their SaaS products from within a complete IT platform wi…
 
Amundsen was started at Lyft and is the leading open-source data catalog with the fastest-growing community and the most integrations. Amundsen enables you to search your entire organization by text search, see automated and curated metadata, share context with co workers, and learn from others by seeing most common queries on a table or frequently…
 
Summary When you build a machine learning model, the first step is always to load your data. Typically this means downloading files from object storage, or querying a database. To speed up the process, why not build the model inside the database so that you don’t have to move the information? In this episode Paige Roberts explains the benefits of p…
 
Summary Google pioneered an impressive number of the architectural underpinnings of the broader big data ecosystem. Now they offer the technologies that they run internally to external users of their cloud platform. In this episode Lak Lakshmanan enumerates the variety of services that are available for building your various data processing and ana…
 
For our second season of Data Brew, we will be focusing on machine learning, from research to production. We will interview folks in academia and industry to discuss topics such as data ethics, production-grade infrastructure for ML, hyperparameter tuning, AutoML, and many more. Good machine learning starts with high quality data. Irina Malkova sha…
 
Karen Hao trained as a mechanical engineer and then joined a Silicon Valley startup, thinking that technology was the best means to create social change. While surrounded by smart people who were also passionate about using technology for social change, she soon discovered there were no incentives or pathways to accomplish this. “When you're inside…
 
Summary The way to build maintainable software and systems is through composition of individual pieces. By making those pieces high quality and flexible they can be used in surprising ways that the original creators couldn’t have imagined. One such component that has gone above and beyond its originally envisioned use case is BookKeeper, a distribu…
 
Summary SQL is the most widely used language for working with data, and yet the tools available for writing and collaborating on it are still clunky and inefficient. Frustrated with the lack of a modern IDE and collaborative workflow for managing the SQL queries and analysis of their big data environments, the team at Pinterest created Querybook. I…
 
Summary Every part of the business relies on data, yet only a small team has the context and expertise to build and maintain workflows and data pipelines to transform, clean, and integrate it. In order for the true value of your data to be realized without burning out your engineers you need a way for everyone to get access to the information they …
 
In this second episode of the fifth season, Frank and Andy speak to Chris Wexler about using AI to protect the vulnerable. Speaking of which, I would like to advise you, dear listener, that this show touches on some sensitive areas, namely child sexual abuse materials. If you have little ears or sensitive persons within listening range, you may wan…
 
Since our existence, humans tend to defeat death. Knowing the exponential amount of data, can we merge with AI? Some relate that 'technology is a new phase of evolution and if we don't adapt to it, we will become extinct' - Deepak Chopra. Find more answers in this second episode with Bruno Guerreiro.Find Bruno on LinkedIn : https://www.linkedin.com…
 
Summary The data warehouse has become the focal point of the modern data platform. With increased usage of data across businesses, and a diversity of locations and environments where data needs to be managed, the warehouse engine needs to be fast and easy to manage. Yellowbrick is a data warehouse platform that was built from the ground up for spee…
 
Data exploration uses visual exploration to understand what is in a dataset and the characteristics of the data. Data scientists explore data to understand things like customer behavior and resource utilization. Some common programming languages used for data exploration are Python, R, and Matlab. Doris Jung-Lin Lee is currently a Graduate Research…
 
Summary Machine learning models use vectors as the natural mechanism for representing their internal state. The problem is that in order for the models to integrate with external systems their internal state has to be translated into a lower dimension. To eliminate this impedance mismatch Edo Liberty founded Pinecone to build database that works na…
 
Cloud data warehouses are databases hosted in cloud environments. They provide typical benefits of the cloud like flexible data access, scalability, and performance. The company Firebolt provides a cloud data warehouse built for modern data environments. It decouples storage and compute to operate on top of existing data lakes like S3. It computes …
 
E se tivessemos que começar tudo de novo? Quais erros evitaríamos cometer no começo da carreira? Ainda faríamos faculdade? Quais skills consideramos as mais importantes? Paulo Vasconcellos, Allan Sene e Gabriel Lages compartilham o que fariam se tivessem que começar a carreira tudo de novo. Vem que esse episódio está muito legal! Acesse nosso post …
 
Summary Data governance is a phrase that means many different things to many different people. This is because it is actually a concept that encompasses the entire lifecycle of data, across all of the people in an organization who interact with it. Stijn Christiaens co-founded Collibra with the goal of addressing the wide variety of technological a…
 
Apache Superset is an open-source, fast, lightweight and modern data exploration and visualization platform. It can connect to any SQL based data source through SQLAlchemy at petabyte scale. Its architecture is highly scalable and it ships with a wide array of visualizations. The company Preset provides a powerful, easy to use data exploration and …
 
Summary Data lineage is the common thread that ties together all of your data pipelines, workflows, and systems. In order to get a holistic understanding of your data quality, where errors are occurring, or how a report was constructed you need to track the lineage of the data from beginning to end. The complicating factor is that every framework, …
 
Columnar databases store and retrieve columns of data rather than rows of data. Each block of data in a columnar database stores up to 3 times as many records as row-based storage. This means you can read data with a third of the power needed in row-based data, among other advantages. The company Altinity is the leading enterprise provider for Clic…
 
Summary There is a lot of attention on the database market and cloud data warehouses. While they provide a measure of convenience, they also require you to sacrifice a certain amount of control over your data. If you want to build a warehouse that gives you both control and flexibility then you might consider building on top of the venerable Postgr…
 
For our second season of Data Brew, we will be focusing on machine learning, from research to production. We will interview folks in academia and industry to discuss topics such as data ethics, production-grade infrastructure for ML, hyperparameter tuning, AutoML, and many more. Liam Li is a leading researcher in the fields of hyperparameter optimi…
 
Apache Hudi is an open-source data management framework used to simplify incremental data processing and data pipeline development. This framework more efficiently manages business requirements like data lifecycle and improves data quality. Some common use cases for Hudi is record-level insert, update, and delete, simplified file management and nea…
 
An application programming interface, API for short, is the connector between 2 applications. For example, a user interface that needs user data will call an endpoint, like a special URL, with request parameters and receive the data back if the request is valid. Modern applications rely on APIs to send data back and forth to each other and save, ed…
 
Summary Building an API for real-time data is a challenging project. Making it robust, scalable, and fast is a full time job. The team at Tinybird wants to make it easy to turn a continuous stream of data into a production ready API or data product. In this episode CEO Jorge Sancha explains how they have architected their system to handle high data…
 
The traveling salesman problem is a classic challenge of finding the shortest and most efficient route for a person to take given a list of destinations. This is one of many real-world optimization problems that companies encounter. How should they schedule product distribution, or promote product bundles, or define sales territories? The answers t…
 
Although a taboo in many society, death seems to be an inevitable human being condition. Some religions are relating about an afterlife. And many people in the real life have kept their loved ones alive in the digital world. Knowing the exponential amount of data, are data scientists the ones able to effectively tackle real-world problems & help in…
 
Summary Spark is one of the most well-known frameworks for data processing, whether for batch or streaming, ETL or ML, and at any scale. Because of its popularity it has been deployed on every kind of platform you can think of. In this episode Jean-Yves Stephan shares the work that he is doing at Data Mechanics to make it sing on Kubernetes. He exp…
 
The multi-talented Cecilia Aragon is a data scientist, professor, author and champion aerobatic pilot. In this podcast, she explains how learning to fly gave her the confidence to pursue her career in human-centered data science and as an author. Her book, Flying Free: My Victory Over Fear to Become the First Latina Pilot on the US Aerobatic Team, …
 
In software engineering, telemetry is the data that is collected about your applications. Unlike logging, which is used in the development of apps to pinpoint errors and code flows, telemetry data includes all operational data including logs, metrics, events, traces, usage, and other analytical data. Companies usually visualize this information to …
 
For our second season of Data Brew, we will be focusing on machine learning, from research to production. We will interview folks in academia and industry to discuss topics such as data ethics, production-grade infrastructure for ML, hyperparameter tuning, AutoML, and many more. Adam Oliner discusses how to design your infrastructure to support ML,…
 
US industries have historically led in the adoption of advanced automation, all the way back to the first Unimate installed in a New Jersey diecasting plant in 1961. Today, automation has come to describe a broad spectrum of technologies, from smart sensors to autonomously guided vehicles, and until now no single trade association has emerged to re…
 
Loading …

빠른 참조 가이드

Google login Twitter login Classic login