Change Data Capture for the Databricks Lakehouse: Real-Time Ingestion to Enable Modern Analytics
The Databricks Lakehouse offers enterprises the opportunity to consolidate BI and data science workloads onto the same platform, combining the performance and governance of the data warehouse with the flexibility of the data lake. But to achieve this, the Databricks Lakehouse needs to ingest high volumes of data from operational and analytical databases and data warehouses at low latency. Streaming data pipelines, enabled by real-time change data capture (CDC), can help. CDC continuously transfers new and changed data into Databricks, which increases efficiency, scalability, and performance compared with legacy batch pipelines.
This webinar explains what CDC is, why enterprises need it, and how to design CDC pipelines for ingestion into the Databricks Lakehouse. It also defines three must-have elements—multi-node streaming, auto-scaling, and heterogeneous platform support—that enable CDC to meet modern enterprise requirements.
Join the discussion with Eckerson Group, Databricks, and Arcion to learn:
- Adoption drivers and use cases for the Databricks Lakehouse
- The role of CDC in migrations and ongoing updates
- Must-have elements for effective CDC
- A demo of real-time CDC with Arcion to Databricks Lakehouse
Can’t make it? Sign up anyway to receive the replay in your inbox.
Meet the speakers
Rajkumar Sen is the Founder and CTO at Arcion Labs Inc., the only cloud-native, CDC-based data mobility platform. Before that, Raj was a Director of Engineering at SingleStore (prev. MemSQL), where he architected the query optimizer and the distributed query processing engine. Prior to that, Raj was the Principal Engineer at Oracle, where he developed features for the Oracle database query optimizer, and the Senior Staff Engineer position at Sybase, where he architected several components for the Sybase Database. Cluster Edition.
Soham Bhatt is a Solutions Architect leading the EDW and ETL modernization practice at Databricks. Before Databricks he worked at Toyota Motors on building their next-generation Big Data Platform. Prior to that, his background was in building Enterprise Data Warehouses for Fortune 100 companies with Inmon and Kimball methodologies. In his current role, he loves guiding his customers with best practices as they migrate their EDWs to Data Lakehouses.
For 25 years Kevin has deciphered what technology means to practitioners, as an industry analyst, writer, instructor, marketer, and services leader. Kevin launched, built, and led a profitable data services team for EMC Pivotal in the Americas and EMEA, and ran field training at the data integration software provider Attunity (now part of Qlik). A frequent public speaker and co-author of two books on data streaming, Kevin also is a data management instructor at eLearningCurve.