Arcion Joins Databricks Partner Connect To Enable Self-Serve Real-time Data Ingestion using Change Data Capture

Luke Smith
Enterprise Solutions Architect
April 20, 2022
Matt Tanner
Developer Relations Lead
April 20, 2022
Rajkumar Sen
CTO @ Arcion
April 20, 2022
Get a migration case study in your inbox
Join our newsletter
Table of Contents

Databricks launched Databricks Partner Connect as the world's first complete platform of various tools to help customers build data and AI products. With an ecosystem of deeply integrated modern technologies, Lakehouse users can leverage all the solutions they need right from their Databricks account. Today, Arcion is excited to join this initiative as the first partner to offer preconfigured, validated data replication for Databricks Lakehouse users.

Partner Connect: A unified platform for data-driven enterprises

Transactional databases are, in a way, data silos by design. In order to do their job well, these databases must stay very secure but in doing so, they become isolated from the rest of the data infrastructure. However today, there are countless use cases that require a unified data infrastructure with data streaming from all transactional databases. The traditional way of achieving this has been a compromise - involving brittle pipelines and custom scripting that failed more often than it worked. In other words, the process of manually connecting enterprise OLTP and OLAP databases is repetitive, extremely error-prone, and ultimately very time-consuming. 

Databricks Partner Connect and Arcion have partnered to change this and eliminate data silos without compromises. Databricks Partner Connect enables high volume, real-time data ingestion directly into analytics platforms to gain more meaningful business insights, drive ML workloads, and more. And the vast majority of this process is completely automated. 

Databricks Partner Connect is a huge step towards mitigating inefficiencies during data ingestion and analytics while significantly simplifying the process. It unifies pre-validated solutions from different technology partners such as Arcion and makes them available under the Database Lakehouse architecture. It also offers native integration among different tools, making it easier and faster to go into production. In just a few clicks, Partner Connect takes care of all the tedious configurations and sets customers up for different tools like:

  • Data preparation
  • Data transformation
  • Data ingestion
  • Business intelligence 
  • Machine learning
  • Data visualization

Build low-latency, change data capture (CDC) pipelines with Arcion

Change data capture (CDC) is bringing about a paradigm shift in how we store and leverage data for analytics, ML, and other resource-intensive workloads. And fully-managed CDC-based data pipelines is the easiest and fastest way of experiencing that power without infrastructure requirements. 

For instance the powerful features of Arcion enables enterprises to build real-time, distributed CDC data pipelines and eliminate data silos that lead to stale data, fractured insights, and bad decisions. It also reduces migration and licensing costs and boosts productivity across data teams. 

Here at Arcion, our goal is to make data more accessible and while fully-managed CDC pipelines was a tremendous milestone in our endeavor, it’s not the end. By becoming a partner of Databricks Partner Connect, we’ve made CDC and the benefits that come with such as transactional integrity, low impact, end-to-end data consistency, etc far more accessible to the average enterprise and extended the capabilities of the Lakehouse architecture.

Cloud-native architecture

Databricks Partner Connect ensures a smooth and secure connection from Arcion’s cloud-native and distributed solution to Databricks through data pipelines that scale autonomously for real-time ingestion. Arcion provides highly concurrent data ingestion from multiple sources to multiple destinations, with robust fault tolerance to ensure data integrity and consistency. Under the Partner Connect ecosystem, configuring the source and destination systems is faster and easier - taking only minutes from logging into Databricks to starting data replication from transactional databases like Oracle and MySQL and data warehouses like Snowflake.

Streaming data pipelines for real-time analytics and ML

Databricks and Arcion are both built on a no-code approach to empower customers and integrate operational data with enterprise-wide database and application systems using pre-built connectors. It has never been easier (or more reliable) to move data from systems of records such as Oracle, the most popular transactional database, MySQL, the most popular open source relational database, and Snowflake for powerful analytics and ML workloads, even with semi-structured and unstructured data.

This is all made possible by Arcion's cutting-edge approach to CDC (Change Data Capture) which offers real-time ingestion into Databricks’ Delta Lake platform. Experts from Databricks and Arcion went into a lot more detail about CDC and real-time ingestion in our latest webinar. The recording of that webinar is available on-demand, for free.

Reliable, zero-maintenance

Databricks and Arcion unify siloed databases with reliable, zero-maintenance data pipelines, ensuring zero loss fault tolerance, irrespective of failure point or cause. This also saves up hundreds of hours for data teams by eliminating manual configurations and the possibility of human error. Promote collaboration across your entire data infrastructure and ML workflows with continuous availability.

Ease of use and unparalleled reliability make Databricks & Arcion the secret weapons of truly data-driven enterprises.

Getting Started With Arcion on Databricks Partner Connect

Customers can get started with Arcion on Databricks Partner Connect in a handful of steps. To start, log in to Databricks Partner Connect and choose Arcion. This starts a workflow that automatically provisions a SQL Endpoint and associated credentials for Arcion to interact with.

Databricks passes the user’s identity and the SQL endpoint configuration to Arcion automatically via a secure API. From here on out, Arcion does all of the heavy lifting. All that’s left for you to do is to log in to your Arcion account (or start your free 14-day trial) and configure the source database and select the objects (schemas, tables, and columns) that need to be replicated to Databricks.

Below is a detailed walkthrough of using Arcion within Databricks Partner Connect to help you get started.

Step 1: Log into Databricks and select Partner Connect

Step 2: Login Arcion (either use username & password or Google Login)

Step 3: Set up a replication

Step 4: Select a data source (Oracle, MySQL, or Snowflake)

Step 5: Select Databricks, the default target

Step 6: Filter the data & start replication

Once the replication starts, users will be taken to the Replication Dashboard for a 360 degree view of data pipelines, their performance, and status - all updated in real-time. Through the dashboard, you get access to:

Lag monitoring and reporting (overall and table level replication progress)

A live stream of logs for easy debugging

Step 7: Verify the replication within Databricks Data Explore

Arcion on Databricks Partner Connect automates the entire data ingestion process for faster analytics. And the best part is you you can get started right away with a free 14-day trial with all the features of Arcion Cloud - no payment info required. Give it a try and let us know how do you like it.

‍

Finally, before you leave, here is a video tutorial that gives you a step-by-step overview. Enjoy!

Matt is a developer at heart with a passion for data, software architecture, and writing technical content. In the past, Matt worked at some of the largest finance and insurance companies in Canada before pivoting to working for fast-growing startups.
Matt is a developer at heart with a passion for data, software architecture, and writing technical content. In the past, Matt worked at some of the largest finance and insurance companies in Canada before pivoting to working for fast-growing startups.
Luke has two decades of experience working with database technologies and has worked for companies like Oracle, AWS, and MariaDB. He is experienced in C++, Python, and JavaScript. He now works at Arcion as an Enterprise Solutions Architect to help companies simplify their data replication process.
Join our newsletter

Take Arcion for a Spin

Deploy the only cloud-native data replication platform you’ll ever need. Get real-time, high-performance data pipelines today.

Free download

8 sources & 6 targets

Pre-configured enterprise instance

Available in four US AWS regions

Contact us

20+ enterprise source and target connectors

Deploy on-prem or VPC

Satisfy security requirements