MySQL CDC to MemSQL (Now SingleStore): Load Data in Real-Time With CDC

Luke Smith
Enterprise Solutions Architect
August 22, 2022
Matt Tanner
Developer Relations Lead
August 22, 2022
Matt Tanner
Matt Tanner
Developer Relations Lead
Developer Relations Lead
January 25, 2023
15
 min read
Join our newsletter

Keeping data in sync across different platforms in real time is always a challenge. The amount of technologies that support this are abundant but so are their potential drawbacks. Depending on what platforms are being used, the technologies that support real-time will be different. For migrating and replicating data from MySQL to SingleStore, using Change Data Capture, also known as CDC, is one of the best and most popular options available.

Change Data Capture (CDC) is a method for identifying changes that occur in source databases that are then replicated to a target database or data warehouse. The changes are detected in real-time, giving room for operations to be carried out as the data is replicated to a specific target location. This is all done without scheduling updates to data at set intervals, reminiscent of older “batch-style” data replication.

Using CDC effectively saves companies from focusing on unnecessary data work as it concentrates only on changes to records, such as inserts, and updates, rather than paying attention to the entire database. Once the source and target databases are in sync, only the changes to the data need to be moved to the target. This is more efficient than replicating or reloading the entire database. CDC applies real-time movement by collecting data from a database server and replicating the data to a data warehouse, data lake, or other databases in real or near real-time.

Table of Contents

What is SingleStore (prev. MemSQL)?

SingleStore, previously called MemSQL, is a high-performance relational database management system (RDBMS) designed for real-time operational analytics. SingleStoreDB is the company's flagship product. It is a purpose-built DBMS optimized for streaming data and delivering real-time analytics.

SingleStoreDB can handle high volumes of data with low latency and supports a wide variety of data types. It is also scalable and easily deployed on-premises or in the cloud. SingleStoreDB is well suited for applications that require fast data ingestion, complex queries, and real-time analytics.

From the brief explanation above, it’s evident that using a CDC solution and SingleStore makes it very easy to move away from a traditional data storage architecture. Using CDC as the method for data migration and replication between multiple platforms allows for quick and easy updates to data. Using CDC allows for a lightweight approach to applying real-time replication that can benefit areas like Business Intelligence (BI), Machine Learning, and Artificial Intelligence.

Implementing CDC with SingleStore

CDC comes in a few different flavors and depending on the technology used to implement CDC and what the source database supports, sometimes only a subset of these will be supported. CDC events can be sourced through several different approaches:

  • Time-based: uses required timestamp column as a source for data
  • Checksum-based: leverages checksum on tables in source/target, using differences as a source for data
  • Log-based: reads database logs as a source for data

Out of all three CDC methods, log-based CDC is considered to be the most effective approach. It doesn’t require direct access to the database, making it non-intrusive and less likely to impact the source system. Log-based CDC also meets real-time replication needs and doesn’t incur any downtime in order to move data between source and target systems. In this next part we will look at achieving real-time data integration using log-based CDC. By giving you a general overview of how this is done, you can see what is required to build real-time applications that serve your organization’s use cases.

First, let’s take a look at two ways to migrate data from an existing MySQL database into SingleStore.

Migrating Data From MySQL to SingleStore using SingleStore Replicate

In this first method, we will look at using the SingleStore Replicate tool to migrate data from a MySQL database to SingleStore. Replicate is a lightweight migration tooling that assists with bulk data loading for database migrations to SingleStore. Replicate can also be used for incremental CDC after the initial load of data. Using both the migration and replication features together can allow users to test out their workload on SingleStore, and then have a zero-downtime cutover when moving to production.

Step 1: Create the Source Database Objects in SingleStore

It is easy to create the source database object in SingleStore using the Data Definition Language (DDL) to automatically create them. 

The syntax below is an example of how to create the source database object.

CREATE DATABASE [IF NOT EXISTS]  [] [PARTITIONS n]
    [] 
  
 [ON WORKSPACE ]  
 
: 
WITH {SYNC | ASYNC} DURABILITY 
 | WITH {SYNC | ASYNC} REPLICATION 
| WITH SYNC DURABILITY SYNC REPLICATION | WITH ASYNC DURABILITY ASYNC REPLICATION
 
:
    [DEFAULT] CHARACTER SET [=] charset_name 
 
-- "schema" is an alias for "database"
CREATE SCHEMA [IF NOT EXISTS]..

database_name: 

‍This is the name assigned to the new SingleStore database.

SNYC REPLICATION: 

This is enabled by default, ensuring durability against all single-node failures.

WITH (SYNC | ASYNC) DURABILITY: 

This specifies whether in-memory database updates use DDL and DML commands are saved to the log on disk synchronously or asynchronously.

PARTITIONS n: 

This allows you to set the total number of partitions that the data will be split into.

CREATE DATABASE … DEFAULT CHARSET=: 

This is accepted by SingleStore for compatibility with MySQL.

CREATE DATABASE … ON WORKSPACE <workspace_name>;: 

Is used to create a read/write database within a workspace that has already been created.

Step 2: Load a Snapshot of the Source Data into SingleStore

After the source database object is created, the next thing to do is to load a snapshot of the source data from MySQL into SingleStore. The snapshot data is batched and written into SingleStore. Using Replicate, we can also apply incremental data changes made on the source system. This process is carried out in hours or days depending on the network speed and the amount of data you are migrating.

The syntax below is used to take a snapshot of the given database and truncates the transaction log.

SNAPSHOT [DATABASE] ;

database_name: 

This is the name of the database to snapshot.

SNAPSHOT DATABASE: 

This forces a manual snapshot and it runs on the master aggregator.

The command also causes implicit commits and a user must have either the SUPER or BACKUP privilege to run this command.

Step 3: Enable Incremental CDC on SingleStore

After loading the snapshot of the source data into SingleStore, you can now apply incremental CDC to the data. We will use SingleStore’s Replicate tool in order to enable this.

The sample syntax below shows an example of a table in MySQL and a subsequent query on the table showing the total amount of rows. We will use this table to illustrate how data is applied from MySQL to SingleStore.

CREATE TABLE `scan` (
  `scan_id` bigint(20) NOT NULL,
  `package_id` bigint(20) NOT NULL,
  `loc_id` int(11) DEFAULT NULL,
  `Loc_name` char(5) DEFAULT NULL,
  PRIMARY KEY (`package_id`,`scan_id`)
) ;
 
select count(*) from scan;
7340032

The scan table above includes 7.3 million records which we will configure to sync between MySQL and SingleStore. This data replication capability is achieved through the SingleStore Replicate tool. The first step in this is to begin by populating the SingleStore configuration file with endpoints of both databases. 

Below is a sample of a SingleStore configuration file.

type: SINGLESTORE
 
host: svc-1732741a-f499-467c-a722-9887d73150c1-ddl.aws-virginia-2.svc.singlestore.com
port: 3306
 
username: 
password: 
 
#credential-store:
#  type: PKCS12
#  path: #Location of key-store
#  key-prefix: "SingleStore_"
#  password: #If password to key-store is not provided then default password will be used
 
max-connections: 30
 
max-retries: 10
retry-wait-duration-ms: 1000

The next step is to execute the Replicate command based on the configuration file. An example of this can be seen below:

./bin/replicant snapshot conf/conn/mysql.yaml conf/conn/singlestore.yaml

To verify if the database and corresponding tables exist, run the statement below to query the data in SingleStore:

select count(*) from scan; --7340032

At this point, the data should have migrated from MySQL to your SingleStore database instance.

Using mysqldump to Migrate Data to SingleStore

We can also use MySQL’s mysqldump client utility to migrate data from MySQL to SingleStore. This is done by using the mysqldump command. 

The mysqldump utility performs logical backups that produce a set of SQL statements that can be executed to reproduce the original database object definitions and table data. It dumps one or more MySQL databases for backup or to migrate data to another SQL database. 

The MySQL table above, which has a record count of 7.3 million records, will still be used. You’ll need to ensure that the table is created in your MySQL instance.

CREATE TABLE `scan` (
  `scan_id` bigint(20) NOT NULL,
  `package_id` bigint(20) NOT NULL,
  `loc_id` int(11) DEFAULT NULL,
  `Loc_name` char(5) DEFAULT NULL,
  PRIMARY KEY (`package_id`,`scan_id`)
) ;
 
select count(*) from scan;
7340032

To use the mysqldump command, open the terminal on the server where your MySQL instance is running and run the command below against your MySQL instance.

mysqldump command template:

mysqldump -h  -u  -p  > .sql

mysqldump command example:

mysqldump -h 127.0.0.1 -u  -p scan > scan_dump.sql

The mysqldump output is made up of DDL and DML. In the case above, the output will look similar to this:

CREATE TABLE `scan` (
  `scan_id` bigint(20) NOT NULL,
  `package_id` bigint(20) NOT NULL,
  `loc_id` int(11) DEFAULT NULL,
  `Loc_name` char(5) DEFAULT NULL,
  PRIMARY KEY (`package_id`,`scan_id`)
 
INSERT INTO scan (1001, 1, 1, “CA”);
INSERT INTO scan (1002, 1, 2, “FL”);

Since SingleStore is a distributed SQL database, the shard and sort keys will need to be added to the table. This is done by simply editing the DDL as highlighted below:

create table scan (
 scan_id BIGINT NOT NULL,
 package_id BIGINT NOT NULL,
 loc_id INT,
 loc_name CHAR(5),
 KEY (scan_id) USING CLUSTERED COLUMNSTORE,
 SHARD(package_id) );

The mysqldump output file should then be saved. Then, we will create a new database in our SingleStore cluster using the output file from the mysqldump command. We will run one last command to move the data into SingleStore.

Command template:

mysql -u admin -h  --default-auth=mysql_native_password -p  < .sql

Command example:

mysql -u admin -h svc-your-cluster-id-here-ddl.aws-virginia-2.svc.singlestore.com --default-auth=mysql_native_password -p destination_db < scan_dump.sql

The tables and data should now appear in your SingleStore database.

Why Use Arcion?

If you’re looking for the fastest and easiest way to set up CDC to migrate and replicate data from MySQL to SingleStore, Arcion has you covered. Arcion is a no-code solution to enable CDC for your data sitting in MySQL. Arcion offers the flexibility of both self-hosted edition and fully-managed cloud product to fit your exact needs.

By using Arcion, it’s easy for organizations to build pipelines using MySQL as a data source and SingleStore as a target. Arcion also supports moving data to other big data platforms such as Snowflake or Databricks, with no code required. For this particular use case, Arcion supports real-time CDC that is much easier and more flexible than using inefficient batch-style data replication that doesn’t support real-time use cases.

Benefits of using Arcion include:

  • No-code connector support for 20+ sources and target databases and data warehouses
  • Multiple deployment types supported across cloud and on-premise installations
  • Configuration can easily be done through UI (available both in self-hosted & cloud), with minimal effort and zero code
  • Automatic schema conversion & schema evolution support out-of-the-box (including SQL and NoSQL conversion) 
  • Patent-pending distributed & highly scalable architecture: Arcion is the only end-to-end multi-threaded CDC solution on the market that auto-scales vertically & horizontally. Any process that Arcion runs on Source & Target is parallelized using patent-pending techniques to achieve maximum throughput. 
  • Built-in high availability (HA): Arcion is designed with high availability built-in. It makes the pipeline robust without disruption and data is always available in the target, in real-time.
  • Auto-recovery (patent-pending): Internally, Arcion does a lot of check-pointing. Therefore, any time the process gets killed for any reason (e.g., database, disk, network, server crashes), it resumes from the point where it was left off, instead of restarting from scratch. The entire process is highly optimized with a novel design that makes the recovery extremely fast.  

Moving Data from MySQL to SingleStore with Arcion Cloud

Before we jump in a step-by-step demo of moving data from MySQL to SingleStore, below is a video demo captured from a joint webinar hosted by SingleStore & Arcion on May 2022. In this live demo given by Arcion founder & CTO, Rajkumar Sen, you'll see:

  • Live ingest 10M rows of TPCH schema from Oracle to SingleStore - in a matter of minutes!
  • Perform large transactions on Oracle (10k+ row updates) using log-based CDC, and see near-instant changes reflected on the target
  • Consolidate transactional data from Oracle (10M rows) & MySQL (1M rows) into SingleStore
  • Perform transactions on tables in Oracle and MySQL and see those transactions replicated in real-time

For those who prefer to follow from a doc format, please read on.

In this walkthrough, we will use Arcion Cloud to demonstrate how easy it is to use for data migration and real-time replication. For those want to deploy Arcion on-prem or in the VPC to satisfy security requirements, you can download Arcion Self-hosted for free to achieve the same results. Here are the docs to show you how to set up MySQL as the source and SingleStore as the target with the self-hosted version.

Assuming the data source and target are created, we will log into Arcion Cloud to set up our replication pipeline to enable CDC (log into Arcion Cloud here).

Create new replication using Arcion Cloud

Next, we will select our replication mode and write mode. A few options are available to suit your needs. For replication modes, Arcion supports:

  • Snapshot (the initial load) 
  • Full (snapshot + CDC)

For write modes, Arcion supports:

  • Replacing
  • Truncating

We will select the replication mode as Full and the write mode as Truncating for our purposes here. You will also see that I have named the replication “MySQL to SingleStore.”

Choose replication mode within Arcion Cloud

Once the Name is populated and the Replication and Write Modes are selected, click Next at the bottom of the screen. We are then brought to the Source screen. From here we will click the Create New button.

Create new source within Arcion Cloud

We then will select MySQL as our source, and then, scroll to the bottom of the page and click Continue.

Select MySQL as the Source within Arcion Cloud

Now, we can add in our MySQL instance details. These details include:

  • Connection Name
  • Host
  • Port
  • Username
  • Password

All other fields will default. For Username and Password, we will use the users created in the script we ran earlier against our MySQL instance.

Once the connection is saved, we will want to pull in the schema from the database. Arcion Cloud will prompt us to click the Sync Connector button on the next page. Click the button, and Arcion Cloud will connect to our MySQL instance and pull down the schema.

Sync MySQL connector within Arcion Cloud

Once completed, the UI in Arcion Cloud will display the retrieved schema. Then we will click Continue in the bottom right corner of the screen to move to the next step.

Retrieved schema from MySQL within Arcion Cloud

We now have our data source correctly configured. This will be displayed on the next screen, as well as a Test Connection button. To ensure that everything is working correctly, we will click the Test Connection button.

Test MySQL connector within Arcion Cloud

The results should look like this once the test is finished running. Finally, you can click the Done button to exit.

Test connector result

With our test successful, we can now click Continue to Destination in the bottom right corner of the screen to move to the steps where we set up our destination.

Continue to Destination within Arcion Cloud

On the Destination screen, we will click New Connection to start the setup of our SingleStore connector.

Add Destination within Arcion Cloud

Then, select SingleStore as your Connection Type and click Continue.

Add SingleStore as Target database within Arcion Cloud

On the next screen, input your connection details. These details include:

  • Connection Name
  • Host
  • Port
  • Username
  • Password

All other fields will default. For username and password, we will use the users created in the script we ran earlier against our SingleStore instance.

Connect SingleStore within Arcion Cloud

Once you put in all the details and click Continue, you’ll see the SingleStore connection is created.

SingleStore added as Target database

Now, we can click Continue to Filter to begin the Filter configuration for our pipeline. On the Filters screen, we will check the Select All checkbox to replicate all of our tables and columns from the source to the destination.

Filter configuration for the pipeline within Arcion Cloud

You can also click on the Map Tables and Per Table Configs (Applier Configuration Docs, Extractor Configuration Docs) buttons to add further configuration. For our purposes, we will leave these as their default values. After this, you will click Start Replication.

Arcion Cloud Dashboard: Snapshot Mode

Once the initial data is loaded, the pipeline will continue to run, monitor for changes, and apply those changes to the destination. The idle pipeline will still show RUNNING in the top right of the screen but will show a row replication rate of 0 until new data is written to the source. You’ll also notice that the Phase description of the pipeline will now show Change Data Capture instead of Loading Snapshot Data.

If we start adding data to the MySQL instance (for example, by rerunning our load script), we will see that Arcion detects this and then syncs that data over to Snowflake in real-time.

Arcion Cloud Dashboard: CDC Mode

Conclusion

This article gave you a detailed overview of Change Data Capture with MySQL and SingleStore. You were first introduced to CDC, then to some native solutions of migrating data (like using mysqldump), and finally took a look at Arcion as the alternative no-code CDC solution.

To get a zero-code, distributed CDC solution that works seamlessly with MySQL and SingleStore, try out Arcion and have a scalable and reliable CDC-enabled pipeline set up in minutes.  If you’re looking for a flexible on-premise solution, try out Arcion Self-hosted. Regardless of the deployment type chosen, Arcion provides unparalleled flexibility and scalability for data pipelines using MySQL as a data source and having SingleStore as the target destination for the data.

Before You Leave, Here Are Some FAQs

Still looking for more info? Let’s take a quick look at some of the most frequently asked questions about how to implement CDC between MySQL and SingleStore.

Does MySQL Support CDC?

MySQL is one of the most popular databases in the world and has been for quite some time.  Typically, MySQL stores and operates data in batches and from time to time creates data pipelines to move or copy data from MySQL to the data warehouse. These bulk imports and exports create gaps in the data. This may occur from errors encountered during this process, causing it to be delayed, and the fact that these operations do not happen in real-time.

CDC is a technique that captures database insert, updates, and deletes from data sources such as MySQL to provide an alternative to inefficient bulk imports. It does this in real-time and has become a preferred way of handling changes to your data in real-time and with less fuss.

MySQL, therefore, is able to support CDC. This paradigm shift answers the question of real-time sync requirements by capturing changes, keeping downstream systems up-to-date, and decreasing the load as only relevant information will be updated.

Is SingleStore An In-Memory Database?

Yes. SingleStore is an in-memory database. An in-memory database offers significant performance improvements. It has a very short persistence because the data is in-memory, the database can respond quickly to simpler queries. An in-Memory database also has a pre-aggregated cache and a dashboard for allowing multiple users to leverage performance by responding to a large number of queries.

SingleStore is so much more than just an in-memory database, though. It functions as an in-memory database when needed but also handles other workloads such as data warehouse workloads using a tiered storage model which includes in-memory, on-disk storage, and cloud object store to deliver fast performance.

Is SingleStore a Relational Database?

SingleStore is a relational database management system with a SQL interface. The database platform stores data in memory and runs in clusters. It has additional properties such as a distributed architecture, typically running on a cluster of servers. It gives users the option to read and write data directly to and from main memory by accessing data in DRAM. It is also able to run on any Linux environment including physical hardware (bare metal), in virtual machines, or containerized deployments.

Matt is a developer at heart with a passion for data, software architecture, and writing technical content. In the past, Matt worked at some of the largest finance and insurance companies in Canada before pivoting to working for fast-growing startups.
Luke has two decades of experience working with database technologies and has worked for companies like Oracle, AWS, and MariaDB. He is experienced in C++, Python, and JavaScript. He now works at Arcion as an Enterprise Solutions Architect to help companies simplify their data replication process.
Join our newsletter

Take Arcion for a Spin

Deploy the only cloud-native data replication platform you’ll ever need. Get real-time, high-performance data pipelines today.

Free download

8 sources & 6 targets

Pre-configured enterprise instance

Available in four US AWS regions

Contact us

20+ enterprise source and target connectors

Deploy on-prem or VPC

Satisfy security requirements