Securely migrating 400M rows in minutes from MySQL to SingleStore using Arcion with zero downtime

Source
Target
distributed
database

A fast-growing peer-to-peer publishing platform with customers in 23 countries, Publica successfully migrated from MySQL, PostgreSQL, and Redis to SingleStore using Arcion Cloud without any downtime.

Company Highlights
1M+
active users
1200+
stores
23
countries
Solution
Highlights
4+
weeks of engineering time saved
400M
rows migrated
4,5K
rows/second transferred on average

Who

At Publica, we empower our customers to take control of their online business by providing not only an e-commerce platform where they have the flexibility to choose their own business models, but also by helping them nurture relationships with their audiences.

We have 1 million+ active users consuming content, 1200+ stores, and customers from 23 countries. Our customers include Narcea, EUNSA, eBooks Patagonia, libreka! Distribution, alphaeditorial, Grupo Planeta, Penguin Random House, Lejister.com, and Mandioca.

What and Why

Being a customer-centric company, we always strive to work on exactly what our customers need the most. At times, that meant technical excellence took a backseat. As we and our customers grew over the years, we started to put more work into our database. 

But we had trouble scaling search for our biggest customers, the more complex storefronts queries took more time than we initially expected, plus we had other small technical issues. Moreover, having three different database technologies did not make the infrastructure optimization process easy.

To follow our motto, we knew we had to do better. So we decided to migrate away from the trusty MySQL, PostgreSQL and Redis that diligently served us for many years, into the bleeding edge of SingleStore. We consolidated everything into one super powerful and multi-purpose database.

How

We're not new to the challenges of database migrations. Since the very beginning, we've worked on two migration projects. 

This time was different. We’re no longer a small business with a few users. We have grown to a size where we can’t afford to have any downtime because it would impact our customers’ business. Every minute of downtime means thousands of frustrated users and lost revenue. Not to mention, we were also dealing with 25x the data we had before.

We had options. 

We previously used a dump and import strategy, but this did not solve speed and data correctness issues. Speed is key because we need to minimize downtime.

We also used CSV and Parquet exports to later import via SingleStore pipelines. These were much faster than dump and import, but still complex and error prone. Most importantly, none of these options offered real-time data sync, a simple to use dashboard ,or real-time migration metrics.

We could have taken down the entire platform for about 6 hours, while we migrated everything and ran lots of checks to validate that the migration was successful. We could have invested many months into refactoring our app so that it could work with both databases at the same time. We could also have hired an expensive professional service firm to help us.

Instead, we discovered Arcion Cloud.

Arcion Cloud is a managed service that sits on their rock solid self-hosted product. Additionally, it provides a very user friendly GUI to set up, run, and monitor your migrations. 

Arcion actually uses the word “replication”, which is accurate since it does not erase anything in the source. It became the perfect choice for us. As a small, lean engineering team, we have a very strong ethos, hence are inclined away from managing any infrastructure.

There are two types of replications you can choose from: Snapshots and Real Time.

Snapshots are your more typical data replication — take this data from database A and migrate it to database B.

Real time is where things start to get interesting. You can instruct Arcion to do a real-time sync between a data source and a destination. In MySQL, it does that through a system called Change Data Capture (CDC). 

Now, if you're like me, you've probably never heard of the term CDC. It’s short for Change Data Capture, and it basically means there's a log of everything that happens in the data store and that log can be consumed without querying against the source database directly.

I was pleased to find out that Arcion goes one step further and allows you to set up a "Full" replication, in which it will first do a snapshot and then switch to the CDC mode, effectively keeping the source and destination in sync and with sub second latency. Yes, you read that right - Arcion ensured that our MySQL and SingleStore databases were always up to a second behind each other.

Arcion provides many different options to fine tune a replication, and also to manage performance and resource usage. This is especially useful since you could easily choke your databases given how powerful Arcion is. For example, you can manually select the tables of a replication, update those table's names in the destination, and set up engine specific parameters like a table's storage type.

Since we migrated from MySQL, PostgreSQL and Redis to SingleStore, we needed deep schema changes, new indexes, and SingleStore specific parameters. Thankfully, Arcion also lets you choose a "Write mode" so you can decide if you want to create the schemas from scratch (“Replacing”), or preserve the existing ones and go straight to inserting data (“Truncating”). We used the latter and it allowed us to carefully handcraft the schemas for optimal performance and then hand over the data replication task to Arcion.

We had a handful of databases to migrate, so we prioritized them by size and complexity. Some of those were from systems we could 100% pause without impacting any user and carry out the replication with ease, so naturally, those were the first ones. We took about a week of active time to learn how to use all of Arcion’s features while effectively migrating those low-hanging fruit.

After migrating the first batch of databases, we felt we had enough experience and a clear idea of how to approach the process with our biggest production database. 

This is a mission-critical database where downtime needed to be minimized as much as possible. Our action plan consisted of 50+ singular steps that are specifically unique to our project, but below are the steps that would apply to anyone doing this kind of database migration work:

  1. Setup CDC in the source database.
  2. Create Arcion’s user in both source and target databases.
  3. Manually import the schema in the target database (you may not even need to do this).
  4. Set up the replication in Arcion’s GUI.
  5. Run the replication.
  6. Monitor the process in Arcion’s dashboard and wait for the replication to switch from snapshot to real-time mode.
  7. Enable maintenance mode.
  8. Swap the database being used by the app.
  9. Stop the replication from Arcion’s dashboard.
  10. Disable maintenance mode.
  11. Sit back and relax.

The migrations were a big success. Arcion Cloud transferred 4,500 rows/second on average, which made it a seamless process for us! Once the migration is completed successfully, our customers instantly notice that our service is much faster. But not everything was smooth sailing: we encountered some difficulties and weird AWS RDS settings that needed tweaking. Thankfully, Arcion’s support team is as effective as their product and they helped us along the way.

In the end, we estimate Arcion has helped us save at least 4 weeks of engineering effort. That’s time we’ve been able to put towards improving our product instead of being data shepherds. Arcion has become the de facto data replication tool at publica.la, and I will personally use it every time I see fit.

Join our newsletter

Take Arcion for a Spin

Deploy the only cloud-native data replication platform you’ll ever need. Get real-time, high-performance data pipelines today.

Get started for free

5 connectors: Oracle, MySQL, Databricks, Snowflake, SingleStore

Pre-configured enterprise instance

Available in four US AWS regions

Free download

20+ enterprise source and target connectors

Deploy on-prem or VPC

Satisfy security requirements

Join the waitlist for Arcion Cloud (beta)

Fully managed, in the cloud.

Start your 30-day free trial with Arcion self-hosted edition

Self managed, wherever you want it.

Please use a valid email so we can send you the trial license.