Keeping data in sync across different platforms in real time is always a challenge. The amount of technologies that support this are abundant but so are their potential drawbacks. Depending on what platforms are being used, the technologies that support real-time will be different. For migrating and replicating data from MySQL to SingleStore, using Change Data Capture, also known as CDC, is one of the best and most popular options available.
Change Data Capture (CDC) is a method for identifying changes that occur in source databases that are then replicated to a target database or data warehouse. The changes are detected in real-time, giving room for operations to be carried out as the data is replicated to a specific target location. This is all done without scheduling updates to data at set intervals, reminiscent of older “batch-style” data replication.
Using CDC effectively saves companies from focusing on unnecessary data work as it concentrates only on changes to records, such as inserts, and updates, rather than paying attention to the entire database. Once the source and target databases are in sync, only the changes to the data need to be moved to the target. This is more efficient than replicating or reloading the entire database. CDC applies real-time movement by collecting data from a database server and replicating the data to a data warehouse, data lake, or other databases in real or near real-time.
What is SingleStore (prev. MemSQL)?
SingleStore, previously called MemSQL, is a high-performance relational database management system (RDBMS) designed for real-time operational analytics. SingleStoreDB is the company's flagship product. It is a purpose-built DBMS optimized for streaming data and delivering real-time analytics.
SingleStoreDB can handle high volumes of data with low latency and supports a wide variety of data types. It is also scalable and easily deployed on-premises or in the cloud. SingleStoreDB is well suited for applications that require fast data ingestion, complex queries, and real-time analytics.
From the brief explanation above, it’s evident that using a CDC solution and SingleStore makes it very easy to move away from a traditional data storage architecture. Using CDC as the method for data migration and replication between multiple platforms allows for quick and easy updates to data. Using CDC allows for a lightweight approach to applying real-time replication that can benefit areas like Business Intelligence (BI), Machine Learning, and Artificial Intelligence.
Implementing CDC with SingleStore
CDC comes in a few different flavors and depending on the technology used to implement CDC and what the source database supports, sometimes only a subset of these will be supported. CDC events can be sourced through several different approaches:
- Time-based: uses required timestamp column as a source for data
- Checksum-based: leverages checksum on tables in source/target, using differences as a source for data
- Log-based: reads database logs as a source for data
Out of all three CDC methods, log-based CDC is considered to be the most effective approach. It doesn’t require direct access to the database, making it non-intrusive and less likely to impact the source system. Log-based CDC also meets real-time replication needs and doesn’t incur any downtime in order to move data between source and target systems. In this next part we will look at achieving real-time data integration using log-based CDC. By giving you a general overview of how this is done, you can see what is required to build real-time applications that serve your organization’s use cases.
First, let’s take a look at two ways to migrate data from an existing MySQL database into SingleStore.
Migrating Data From MySQL to SingleStore using SingleStore Replicate
In this first method, we will look at using the SingleStore Replicate tool to migrate data from a MySQL database to SingleStore. Replicate is a lightweight migration tooling that assists with bulk data loading for database migrations to SingleStore. Replicate can also be used for incremental CDC after the initial load of data. Using both the migration and replication features together can allow users to test out their workload on SingleStore, and then have a zero-downtime cutover when moving to production.
Step 1: Create the Source Database Objects in SingleStore
It is easy to create the source database object in SingleStore using the Data Definition Language (DDL) to automatically create them.
The syntax below is an example of how to create the source database object.
This is the name assigned to the new SingleStore database.
This is enabled by default, ensuring durability against all single-node failures.
WITH (SYNC | ASYNC) DURABILITY:
This specifies whether in-memory database updates use DDL and DML commands are saved to the log on disk synchronously or asynchronously.
This allows you to set the total number of partitions that the data will be split into.
CREATE DATABASE … DEFAULT CHARSET=:
This is accepted by SingleStore for compatibility with MySQL.
CREATE DATABASE … ON WORKSPACE <workspace_name>;:
Is used to create a read/write database within a workspace that has already been created.
Step 2: Load a Snapshot of the Source Data into SingleStore
After the source database object is created, the next thing to do is to load a snapshot of the source data from MySQL into SingleStore. The snapshot data is batched and written into SingleStore. Using Replicate, we can also apply incremental data changes made on the source system. This process is carried out in hours or days depending on the network speed and the amount of data you are migrating.
The syntax below is used to take a snapshot of the given database and truncates the transaction log.
This is the name of the database to snapshot.
This forces a manual snapshot and it runs on the master aggregator.
The command also causes implicit commits and a user must have either the SUPER or BACKUP privilege to run this command.
Step 3: Enable Incremental CDC on SingleStore
After loading the snapshot of the source data into SingleStore, you can now apply incremental CDC to the data. We will use SingleStore’s Replicate tool in order to enable this.
The sample syntax below shows an example of a table in MySQL and a subsequent query on the table showing the total amount of rows. We will use this table to illustrate how data is applied from MySQL to SingleStore.
The scan table above includes 7.3 million records which we will configure to sync between MySQL and SingleStore. This data replication capability is achieved through the SingleStore Replicate tool. The first step in this is to begin by populating the SingleStore configuration file with endpoints of both databases.
Below is a sample of a SingleStore configuration file.
The next step is to execute the Replicate command based on the configuration file. An example of this can be seen below:
To verify if the database and corresponding tables exist, run the statement below to query the data in SingleStore:
At this point, the data should have migrated from MySQL to your SingleStore database instance.
Using mysqldump to Migrate Data to SingleStore
We can also use MySQL’s mysqldump client utility to migrate data from MySQL to SingleStore. This is done by using the mysqldump command.
The mysqldump utility performs logical backups that produce a set of SQL statements that can be executed to reproduce the original database object definitions and table data. It dumps one or more MySQL databases for backup or to migrate data to another SQL database.
The MySQL table above, which has a record count of 7.3 million records, will still be used. You’ll need to ensure that the table is created in your MySQL instance.
To use the mysqldump command, open the terminal on the server where your MySQL instance is running and run the command below against your MySQL instance.
mysqldump command template:
mysqldump command example:
The mysqldump output is made up of DDL and DML. In the case above, the output will look similar to this:
Since SingleStore is a distributed SQL database, the shard and sort keys will need to be added to the table. This is done by simply editing the DDL as highlighted below:
The mysqldump output file should then be saved. Then, we will create a new database in our SingleStore cluster using the output file from the mysqldump command. We will run one last command to move the data into SingleStore.
The tables and data should now appear in your SingleStore database.
Why Use Arcion?
If you’re looking for the fastest and easiest way to set up CDC to migrate and replicate data from MySQL to SingleStore, Arcion has you covered. Arcion is a no-code solution to enable CDC for your data sitting in MySQL. Arcion offers the flexibility of both self-hosted edition and fully-managed cloud product to fit your exact needs.
By using Arcion, it’s easy for organizations to build pipelines using MySQL as a data source and SingleStore as a target. Arcion also supports moving data to other big data platforms such as Snowflake or Databricks, with no code required. For this particular use case, Arcion supports real-time CDC that is much easier and more flexible than using inefficient batch-style data replication that doesn’t support real-time use cases.
Benefits of using Arcion include:
- No-code connector support for 20+ sources and target databases and data warehouses
- Multiple deployment types supported across cloud and on-premise installations
- Configuration can easily be done through UI (available both in self-hosted & cloud), with minimal effort and zero code
- Automatic schema conversion & schema evolution support out-of-the-box (including SQL and NoSQL conversion)
- Patent-pending distributed & highly scalable architecture: Arcion is the only end-to-end multi-threaded CDC solution on the market that auto-scales vertically & horizontally. Any process that Arcion runs on Source & Target is parallelized using patent-pending techniques to achieve maximum throughput.
- Built-in high availability (HA): Arcion is designed with high availability built-in. It makes the pipeline robust without disruption and data is always available in the target, in real-time.
- Auto-recovery (patent-pending): Internally, Arcion does a lot of check-pointing. Therefore, any time the process gets killed for any reason (e.g., database, disk, network, server crashes), it resumes from the point where it was left off, instead of restarting from scratch. The entire process is highly optimized with a novel design that makes the recovery extremely fast.
Moving Data from MySQL to SingleStore with Arcion Cloud
Before we jump in a step-by-step demo of moving data from MySQL to SingleStore, below is a video demo captured from a joint webinar hosted by SingleStore & Arcion on May 2022. In this live demo given by Arcion founder & CTO, Rajkumar Sen, you'll see:
- Live ingest 10M rows of TPCH schema from Oracle to SingleStore - in a matter of minutes!
- Perform large transactions on Oracle (10k+ row updates) using log-based CDC, and see near-instant changes reflected on the target
- Consolidate transactional data from Oracle (10M rows) & MySQL (1M rows) into SingleStore
- Perform transactions on tables in Oracle and MySQL and see those transactions replicated in real-time
For those who prefer to follow from a doc format, please read on.
In this walkthrough, we will use Arcion Cloud to demonstrate how easy it is to use for data migration and real-time replication. For those want to deploy Arcion on-prem or in the VPC to satisfy security requirements, you can download Arcion Self-hosted for free to achieve the same results. Here are the docs to show you how to set up MySQL as the source and SingleStore as the target with the self-hosted version.
Assuming the data source and target are created, we will log into Arcion Cloud to set up our replication pipeline to enable CDC (log into Arcion Cloud here).
Next, we will select our replication mode and write mode. A few options are available to suit your needs. For replication modes, Arcion supports:
- Snapshot (the initial load)
- Full (snapshot + CDC)
For write modes, Arcion supports:
We will select the replication mode as Full and the write mode as Truncating for our purposes here. You will also see that I have named the replication “MySQL to SingleStore.”
Once the Name is populated and the Replication and Write Modes are selected, click Next at the bottom of the screen. We are then brought to the Source screen. From here we will click the Create New button.
We then will select MySQL as our source, and then, scroll to the bottom of the page and click Continue.
Now, we can add in our MySQL instance details. These details include:
- Connection Name
All other fields will default. For Username and Password, we will use the users created in the script we ran earlier against our MySQL instance.
Once the connection is saved, we will want to pull in the schema from the database. Arcion Cloud will prompt us to click the Sync Connector button on the next page. Click the button, and Arcion Cloud will connect to our MySQL instance and pull down the schema.
Once completed, the UI in Arcion Cloud will display the retrieved schema. Then we will click Continue in the bottom right corner of the screen to move to the next step.
We now have our data source correctly configured. This will be displayed on the next screen, as well as a Test Connection button. To ensure that everything is working correctly, we will click the Test Connection button.
The results should look like this once the test is finished running. Finally, you can click the Done button to exit.
With our test successful, we can now click Continue to Destination in the bottom right corner of the screen to move to the steps where we set up our destination.
On the Destination screen, we will click New Connection to start the setup of our SingleStore connector.
Then, select SingleStore as your Connection Type and click Continue.
On the next screen, input your connection details. These details include:
- Connection Name
All other fields will default. For username and password, we will use the users created in the script we ran earlier against our SingleStore instance.
Once you put in all the details and click Continue, you’ll see the SingleStore connection is created.
Now, we can click Continue to Filter to begin the Filter configuration for our pipeline. On the Filters screen, we will check the Select All checkbox to replicate all of our tables and columns from the source to the destination.
You can also click on the Map Tables and Per Table Configs (Applier Configuration Docs, Extractor Configuration Docs) buttons to add further configuration. For our purposes, we will leave these as their default values. After this, you will click Start Replication.
Once the initial data is loaded, the pipeline will continue to run, monitor for changes, and apply those changes to the destination. The idle pipeline will still show RUNNING in the top right of the screen but will show a row replication rate of 0 until new data is written to the source. You’ll also notice that the Phase description of the pipeline will now show Change Data Capture instead of Loading Snapshot Data.
If we start adding data to the MySQL instance (for example, by rerunning our load script), we will see that Arcion detects this and then syncs that data over to Snowflake in real-time.
This article gave you a detailed overview of Change Data Capture with MySQL and SingleStore. You were first introduced to CDC, then to some native solutions of migrating data (like using mysqldump), and finally took a look at Arcion as the alternative no-code CDC solution.
To get a zero-code, distributed CDC solution that works seamlessly with MySQL and SingleStore, try out Arcion and have a scalable and reliable CDC-enabled pipeline set up in minutes. If you’re looking for a flexible on-premise solution, try out Arcion Self-hosted. Regardless of the deployment type chosen, Arcion provides unparalleled flexibility and scalability for data pipelines using MySQL as a data source and having SingleStore as the target destination for the data.
Before You Leave, Here Are Some FAQs
Still looking for more info? Let’s take a quick look at some of the most frequently asked questions about how to implement CDC between MySQL and SingleStore.
Does MySQL Support CDC?
MySQL is one of the most popular databases in the world and has been for quite some time. Typically, MySQL stores and operates data in batches and from time to time creates data pipelines to move or copy data from MySQL to the data warehouse. These bulk imports and exports create gaps in the data. This may occur from errors encountered during this process, causing it to be delayed, and the fact that these operations do not happen in real-time.
CDC is a technique that captures database insert, updates, and deletes from data sources such as MySQL to provide an alternative to inefficient bulk imports. It does this in real-time and has become a preferred way of handling changes to your data in real-time and with less fuss.
MySQL, therefore, is able to support CDC. This paradigm shift answers the question of real-time sync requirements by capturing changes, keeping downstream systems up-to-date, and decreasing the load as only relevant information will be updated.
Is SingleStore An In-Memory Database?
Yes. SingleStore is an in-memory database. An in-memory database offers significant performance improvements. It has a very short persistence because the data is in-memory, the database can respond quickly to simpler queries. An in-Memory database also has a pre-aggregated cache and a dashboard for allowing multiple users to leverage performance by responding to a large number of queries.
SingleStore is so much more than just an in-memory database, though. It functions as an in-memory database when needed but also handles other workloads such as data warehouse workloads using a tiered storage model which includes in-memory, on-disk storage, and cloud object store to deliver fast performance.
Is SingleStore a Relational Database?
SingleStore is a relational database management system with a SQL interface. The database platform stores data in memory and runs in clusters. It has additional properties such as a distributed architecture, typically running on a cluster of servers. It gives users the option to read and write data directly to and from main memory by accessing data in DRAM. It is also able to run on any Linux environment including physical hardware (bare metal), in virtual machines, or containerized deployments.