Databricks launched Databricks Partner Connect as the world's first complete platform of various tools to help customers build data and AI products. With an ecosystem of deeply integrated modern technologies, Lakehouse users can leverage all the solutions they need right from their Databricks account. Today, Arcion is excited to join this initiative as the first partner to offer preconfigured, validated data replication for Databricks Lakehouse users.
Partner Connect: A unified platform for data-driven enterprises
Transactional databases are, in a way, data silos by design. In order to do their job well, these databases must stay very secure but in doing so, they become isolated from the rest of the data infrastructure. However today, there are countless use cases that require a unified data infrastructure with data streaming from all transactional databases. The traditional way of achieving this has been a compromise - involving brittle pipelines and custom scripting that failed more often than it worked. In other words, the process of manually connecting enterprise OLTP and OLAP databases is repetitive, extremely error-prone, and ultimately very time-consuming.
Databricks Partner Connect and Arcion have partnered to change this and eliminate data silos without compromises. Databricks Partner Connect enables high volume, real-time data ingestion directly into analytics platforms to gain more meaningful business insights, drive ML workloads, and more. And the vast majority of this process is completely automated.
Databricks Partner Connect is a huge step towards mitigating inefficiencies during data ingestion and analytics while significantly simplifying the process. It unifies pre-validated solutions from different technology partners such as Arcion and makes them available under the Database Lakehouse architecture. It also offers native integration among different tools, making it easier and faster to go into production. In just a few clicks, Partner Connect takes care of all the tedious configurations and sets customers up for different tools like:
- Data preparation
- Data transformation
- Data ingestion
- Business intelligence
- Machine learning
- Data visualization
Build low-latency, change data capture (CDC) pipelines with Arcion
Change data capture (CDC) is bringing about a paradigm shift in how we store and leverage data for analytics, ML, and other resource-intensive workloads. And fully-managed CDC-based data pipelines is the easiest and fastest way of experiencing that power without infrastructure requirements.
For instance the powerful features of Arcion enables enterprises to build real-time, distributed CDC data pipelines and eliminate data silos that lead to stale data, fractured insights, and bad decisions. It also reduces migration and licensing costs and boosts productivity across data teams.
Here at Arcion, our goal is to make data more accessible and while fully-managed CDC pipelines was a tremendous milestone in our endeavor, it’s not the end. By becoming a partner of Databricks Partner Connect, we’ve made CDC and the benefits that come with such as transactional integrity, low impact, end-to-end data consistency, etc far more accessible to the average enterprise and extended the capabilities of the Lakehouse architecture.
Databricks Partner Connect ensures a smooth and secure connection from Arcion’s cloud-native and distributed solution to Databricks through data pipelines that scale autonomously for real-time ingestion. Arcion provides highly concurrent data ingestion from multiple sources to multiple destinations, with robust fault tolerance to ensure data integrity and consistency. Under the Partner Connect ecosystem, configuring the source and destination systems is faster and easier - taking only minutes from logging into Databricks to starting data replication from transactional databases like Oracle and MySQL and data warehouses like Snowflake.
Streaming data pipelines for real-time analytics and ML
Databricks and Arcion are both built on a no-code approach to empower customers and integrate operational data with enterprise-wide database and application systems using pre-built connectors. It has never been easier (or more reliable) to move data from systems of records such as Oracle, the most popular transactional database, MySQL, the most popular open source relational database, and Snowflake for powerful analytics and ML workloads, even with semi-structured and unstructured data.
This is all made possible by Arcion's cutting-edge approach to CDC (Change Data Capture) which offers real-time ingestion into Databricks’ Delta Lake platform. Experts from Databricks and Arcion went into a lot more detail about CDC and real-time ingestion in our latest webinar. The recording of that webinar is available on-demand, for free.
Databricks and Arcion unify siloed databases with reliable, zero-maintenance data pipelines, ensuring zero loss fault tolerance, irrespective of failure point or cause. This also saves up hundreds of hours for data teams by eliminating manual configurations and the possibility of human error. Promote collaboration across your entire data infrastructure and ML workflows with continuous availability.
Ease of use and unparalleled reliability make Databricks & Arcion the secret weapons of truly data-driven enterprises.
Getting Started With Arcion on Databricks Partner Connect
Customers can get started with Arcion on Databricks Partner Connect in a handful of steps. To start, log in to Databricks Partner Connect and choose Arcion. This starts a workflow that automatically provisions a SQL Endpoint and associated credentials for Arcion to interact with.
Databricks passes the user’s identity and the SQL endpoint configuration to Arcion automatically via a secure API. From here on out, Arcion does all of the heavy lifting. All that’s left for you to do is to log in to your Arcion account (or start your free 14-day trial) and configure the source database and select the objects (schemas, tables, and columns) that need to be replicated to Databricks.
Below is a detailed walkthrough of using Arcion within Databricks Partner Connect to help you get started.
Step 1: Log into Databricks and select Partner Connect
Step 2: Login Arcion (either use username & password or Google Login)
Step 3: Set up a replication
Step 4: Select a data source (Oracle, MySQL, or Snowflake)
Step 5: Select Databricks, the default target
Step 6: Filter the data & start replication
Once the replication starts, users will be taken to the Replication Dashboard for a 360 degree view of data pipelines, their performance, and status - all updated in real-time. Through the dashboard, you get access to:
Lag monitoring and reporting (overall and table level replication progress)
A live stream of logs for easy debugging
Step 7: Verify the replication within Databricks Data Explore
Arcion on Databricks Partner Connect automates the entire data ingestion process for faster analytics. And the best part is you you can get started right away with a free 14-day trial with all the features of Arcion Cloud - no payment info required. Give it a try and let us know how do you like it.
Finally, before you leave, here is a video tutorial that gives you a step-by-step overview. Enjoy!