To keep up with the competition, companies must collect data across various platforms and entities. This is necessary to gain a competitive advantage and better understand consumer requirements and behaviors. Historically, this data was stored in database tables with relationship parameters set between them. Relational data was easy to store and retrieve provided that the platform where the data was stored and accessed was performant.
As corporations and organizations began to generate data at an exponential rate, the limits of traditional databases were tested. It became important to analyze the data in near real-time, something that traditional databases were not designed for. Many companies tried to create processes and systems which could facilitate use cases that required real-time data access and analytics but most were not optimal. This is when many companies began to turn to big data platforms which could handle these tasks.
The advent of big data, where the volume of data increased multiple folds with the data constantly changing, meant that the struggle of the traditional Relational Database Management Systems (RDMS) became more apparent. Traditional use cases could still be handled but for real-time analytics and predictions, these systems were no longer good enough for enterprise needs. This led to the rise of NoSQL databases that could handle unstructured data, such as free-form text, and massive volumes of it without relying on a strict schema for relationships. However, the architecture of these newer databases was more difficult to operate and required additional expertise.
NoSQL databases also lacked some of the advantages of traditional relational databases since they did not support SQL. Due to the unstructured nature of the data, it was often a hassle to express relationships between data. After a while, a new set of modern databases emerged that combined the strength of relational databases with the best advantages of NoSQL databases. These databases can handle concurrent workloads, and transactional/analytical operations and are also SQL-compliant.
With the introduction of these SQL-supported databases with NoSQL performance, many organizations are still able to use the expertise built up over the years of using relational databases. Organizations are now migrating their data to databases that support SQL but are optimized under the hood to support concurrency, low latency, and high workloads/volumes. One of the platforms of choice for this is SingleStore, previously known as MemSQL. SingleStore is a modern proprietary database optimized for high throughput and data-intensive applications. This makes it a perfect fit for many organizations that desire this exact type of functionality and performance.
In this article, we will explore how to employ techniques to load data from MySQL, an open-source relational database, to SingleStore. The techniques shown in this article can usually be applied to other SQL databases as well with slight variations. Overall, the methodology for moving data will be extremely similar. Let’s first get started by Looking at some details about MySQL.
What is MySQL?
MySQL is the most popular open-source database in the world. It is a relational database that supports SQL and Online Transaction Processing (OLTP) operations. MySQL has been in active development for over 25 years and has changed ownership a few times over the years. Along the way, its ownership has changed hands from MySQL AB, the original company that developed the database, to Sun Microsystems and now Oracle Corporation.
MySQL is offered under the open-source GNU General Public License which gives consumers the right to use and modify the software according to those terms. It is also available under various proprietary licenses. MySQL stores data logically in tables, rows, columns, and views but under the hood data is stored in physical files. Storing data like this enables the platform to be optimized for speed and allows for better performance when accessing records from the database.
The client-server model is what forms the architecture of MySQL. With this model as its hub, MySQL can consist of stand-alone clients and a server. The clients can be thought of as applications that can connect and access the data on the server. The clients themselves rely on the server to be able to query, retrieve, and save data changes. The server is the actual MySQL RDBMS that serves as the data store and processes the queries and operations dispatched from the clients.
According to DB-Engines.com, one of the most popular sites that rank the popularity of databases, MySQL is the second most popular database just behind Oracle’s own database offering. That being said, MySQL is a favorite amongst developers and data engineers because of its ease of use, flexibility, and huge support for many programming languages and drivers.
MySQL’s popularity also comes from its inclusion as a component of the extremely popular LAMP stack, a massively popular software stack used to build and run web applications and services. MySQL’s reach is far and wide and is used by some of the biggest websites and services on the internet such as Facebook, Flickr, Twitter, YouTube, and other industry heavyweights. Oracle also offers enterprise versions of MySQL for more specialized use cases, however, the open-source MySQL options are very efficient, reliable, and can be used to run business-critical applications without issue.
Key Features of MySQL
In this section, we will explore some of the key features of MySQL. Many of the highlighted features below are key advantages of using MySQL for data storage and retrieval solution. Let’s take a look at each of these features in more detail.
Ease of use
One of the most important features of MySQL is that it is easy to install and use. It can easily be installed across all popular operating systems including Windows, MacOS, and Linux. Since MySQL uses the SQL standard for interaction with the database, database administrators skilled in SQL can easily get up to speed using MySQL.
MySQL has been around for a long time and is battle-tested on the security front. MySQL has multiple mechanisms to securely store data and protect it from unauthorized access. MySQL uses effective security measures such as encryption and privilege management to ensure the database is secure enough to run in production environments.
MySQL is a scalable database as it employs a native replication architecture that allows for the database to support a large number of users. MySQL is also supported on certain platforms, such as Amazon RDS, which allows it to easily scale with demand.
MySQL databases can be implemented in a highly-available architecture. Using an HA-enabled MySQL instance helps to prevent cases of data loss. It has options for disaster recovery and full replication capabilities to make data available to other downstream systems.
MySQL is an extremely performant and reliable database option. MySQL is a highly efficient database that is Atomicity, Consistency, Isolation, and Durability (ACID) compliant. ACID compliance ensures that the data is stored consistently.
What is SingleStore?
SingleStore is a cloud-native database platform consisting of a unified data engine. SingleStore’s engine can handle both Online Analytical Processing (OLAP) and Online Transaction Processing (OLTP) operations. It is a distributed relational database that supports SQL queries and patterns.
SingeStore has two main ways to deploy its solution: A self-managed and a fully-managed version on the Cloud. SingleStore’s fully managed cloud offering is available through SingleStoreDB Cloud. SingleStoreDB Cloud gives the user the freedom to choose any of the major cloud service providers on which the managed instance can be deployed. Supported cloud providers include Amazon Web services (AWS), Azure, and Google Cloud Platform (GCP). For users that want to run SingleStore on their hardware and manage their database cluster, SingleStore has the SingleStoreDB Self-Managed option. This option requires that users manually deploy and manage the database.
SingleStore is optimized for speed and scalability to support data-intensive and real-time applications where fast ingestion of data is required. It is also a wonderful option for applications that need to support many concurrent users in parallel and return results to queries with sub-second latency. SingleStore is also used for many real-time analytics applications and use cases.
Another advantage of SingleStore is the large number of data types that it supports. SingleStore supports the ingestion of relational data, geospatial data, JSON, time series data, key-value pairs, and full-text search. It also supports the streaming of data, bulk loading of data, and data transformations within the platform. With its large capacity for different data types and streaming capabilities, SingleStore can serve as the input to a machine learning model or be integrated into a real-time analytics dashboard.
Key Features of SingleStore
SingleStore is a database that combines the best features of a relational database, like MySQL, with a distributed cloud-native approach. The cloud-native architecture means that SingleStore is scalable, fast, and can supercharge ML and analytics use cases. In this section, let’s explore some of the key features of SingleStore.
SingleStore is an extremely fast database that enables rapid ingestion of data from various sources. The speed of the SingleStore database goes beyond just the ingestion of data and is also extended to queries. Queries that are executed within SingleStore experience low latency and support high concurrency. SingleStore utilizes both row-store and column-store patterns thereby allowing it to store and retrieve data in the most efficient way possible.
SingleStore is a highly scalable database as it separates compute from storage. This enables it to achieve infinite elasticity as data for applications can be scaled horizontally on the cloud. With its cloud-native approach, SingleStore can be seamlessly scaled cost-effectively while still maintaining high performance.
SingleStore is secure by design and employs end-to-end encryption. The platform can ensure that unauthorized access is prohibited through secure methods of authentication and authorization. SingleStore is also compliant with enterprise security expectations and has industry certifications like ISO/IEC 27001 and SOC Type 2.
Simplicity and Connectability
SingleStore is available as a managed cloud offering so users can be up and running in minutes. Since SingleStore is compatible with other SQL databases, like MySQL, users can utilize their existing tooling and do not need to learn new tools or workflows. SingleStore has various APIs that allow developers to easily integrate them with their applications. SingelStore can also support developers focusing on serverless application development.
How to Load Data from MySQL to SingleStore Using Arcion
To load data from a relational database like MySQL to SingleStore, some concerns traditionally would need to be solved. Part of the migration process would require you to make sure that your database schema and types are compatible. You would also need to make sure that data is not lost in the migration process. This would require you to check that the data in SingleStore is consistent with what was in the primary MySQL database.
Not having a well-implemented migration plan can lead to data loss or data that is inaccurate and incomplete. To avoid such a scenario, you may be best of to use a tool that can easily remove the downside of traditional data migration. For instance, using a data integration platform can help prevent data loss, automatically help with schema conversion, and guarantee data consistency during the migration process. One such platform which can help to make migrations more simple is Arcion.
Arcion is the first cloud-native Change Data Capture (CDC) platform that is designed for infinite scalability and low latency. Arcion supports both MySQL and SingleStore on both Arcion Cloud as well as Arcion Self-Managed. Depending on your exact use case and needs, both deployment options can work. In this section, let's look at how to use Arcion Cloud to load data from MySQL to SingleStore in a few simple steps.
First, we will need to log into Arcion Cloud to set up our replication pipeline to enable CDC. You can sign up and log into Arcion here.
Next, we will select our replication mode and write mode. A few options are available to suit your needs. For replication modes, Arcion supports:
- Snapshot (the initial load)
- Full (snapshot + CDC)
For write modes, Arcion supports:
We will select the replication mode as Full and the write mode as Truncating for our purposes here. You will also see that I have named the replication “MySQL to SingleStore.”
Once the Name is populated and the Replication and Write Modes are selected, click Next at the bottom of the screen.
We are then brought to the Source screen. From here we will click the Create New button.
We then will select MySQL as our source, and then, scroll to the bottom of the page, and click Continue.
Now, we can add our MySQL instance details. These details include:
- Connection Name
All other fields will default. For Username and Password, we will use the users created in the script we ran earlier against our MySQL instance.
Once the connection is saved, we will want to pull in the schema from the database. Arcion Cloud will prompt us to click the Sync Connector button on the next page. Click the button, and Arcion Cloud will connect to our MySQL instance and pull down the schema.
Once completed, the UI in Arcion Cloud will display the retrieved schema. Then we will click Continue in the bottom right corner of the screen to move to the next step.
We now have our data source correctly configured. This will be displayed on the next screen, as well as a Test Connection button. To ensure that everything is working correctly, we will click the Test Connection button.
The results should look like this once the test is finished running. Finally, you can click the Done button to exit.
With our test successful, we can now click Continue to Destination in the bottom right corner of the screen to move to the steps where we set up our destination.
On the Destination screen, we will click New Connection to start the setup of our SingleStore connector.
Then, select SingleStore as your Connection Type and click Continue.
On the next screen, input your connection details. These details include:
- Connection Name
All other fields will default. For username and password, we will use the users created in the script we ran earlier against our SingleStore instance.
Once you put in all the details and click Continue, you’ll see the SingleStore connection is created.
Now, we can click Continue to Filter to begin the Filter configuration for our pipeline. On the Filters screen, we will check the Select All checkbox to replicate all of our tables and columns from the source to the destination.
You can also click on the Map Tables and Per Table Configs (Applier Configuration Docs, Extractor Configuration Docs) buttons to add further configuration. For our purposes, we will leave these as their default values. After this, you will click Start Replication.
Once the initial data is loaded, the pipeline will continue to run, monitor for changes, and apply those changes to the destination. The idle pipeline will still show RUNNING in the top right of the screen but will show a row replication rate of 0 until new data is written to the source. You’ll also notice that the Phase description of the pipeline will now show Change Data Capture instead of Loading Snapshot Data.
If we start adding data to the MySQL instance, we will see that Arcion detects this and then syncs that data over to SingleStore in real time.
Of course, we don’t require CDC to be enabled to handle the initial migration, we could just use the Snapshot mode. An advantage of enabling CDC during the migration is that CDC can make migrations easier by allowing for zero downtime migrations. The old and new databases can stay in sync until all applications and services using the old database are moved to use the new SingleStore instance.
Benefits of using Arcion
As mentioned, MySQL and SingleStore are fully supported by Arcion. The steps required to enable data migration from MySQL to SingleStore, which was discussed in previous sections of this article, can be more conveniently carried out with Arcion. This can be done through a simplified and intuitive UI or the CLI. For example, you can set up a streaming pipeline in Arcion using Change Data Capture to replicate changes from MySQL to any of the other destination databases supported, like SingleStore, or others like Oracle, DB2, SAP (ASE, HANA, IQ), Snowflake, and many more. Below are some highlights of using Arcion for data migrations and Change Data Capture.
Many other existing CDC solutions don’t scale for high-volume, high-velocity data, resulting in slow pipelines, and slow delivery to the target systems. Arcion is the only distributed, end-to-end multi-threaded CDC solution that auto-scales vertically & horizontally. Any process that runs on Source & Target is parallelized using patent-pending techniques to achieve maximum throughput. There isn’t a single step within the pipeline that is single-threaded. It means Arcion users get ultra-low latency CDC replication and can always keep up with the forever-increasing data volume on Source.
100% Agentless Change Data Capture
Arcion is the only CDC vendor in the market that offers 100% agentless CDC to all its supported enterprise connectors. The agentless CDC applies to all complex enterprise databases including MySQL and SingleStore. Arcion reads directly from the database transaction log, never reading from the database itself. Previously, data teams faced administrative nightmares and security risks associated with running agent-based software in production environments. You can now replicate data in real-time, at scale, with guaranteed delivery — but without the inherent performance issues or the security concerns.
Data Consistency Guaranteed
Arcion provides transactional integrity and data consistency through its CDC technology. To further this effort, Arcion also has built-in data validation support that works automatically and efficiently to ensure data integrity is always maintained. It offers a solution for both scalable data migration and replication while making sure that zero data loss has occurred.
Automatic Schema Conversion & Schema Evolution Support
Arcion handles schema changes out of the box requiring no user intervention. This helps mitigate data loss and eliminate downtime caused by pipeline-breaking schema changes by intercepting changes in the source database and propagating them while ensuring compatibility with the target's schema evolution. Other solutions would reload (re-do the snapshot) the data when there is a schema change in the source databases, which causes pipeline downtime and requires a lot of compute resources, which can get expensive!
Pre-Built Enterprise Data Connectors
Arcion has a library of pre-built data connectors. These connectors can provide support for 15+ enterprise databases, data warehouses, and cloud-native analytics platforms (see full list). Unlike other ETL tools, Arcion provides full control over data while still maintaining a high degree of automation. Data can be moved from one source to multiple targets or multiple sources to a single target depending on your use case. This means that if you branch out into other technologies outside of MySQL and SingleStore, you’ll already have the capability within Arcion to handle your new sources and targets without the need for another pipeline technology.
This article gave you a detailed overview of migrating data from MySQL to SingleStore. We explored what MySQL and SingleStore are, looked at their benefits and features, and finally took a look at Arcion as a no-code migration solution.
To get a no-code migration solution that works seamlessly with MySQL and SingleStore, try out Arcion Cloud and have a scalable and reliable CDC-enabled pipeline set up in minutes. If you’re looking for a flexible on-premise solution, try out Arcion Self-Managed. Regardless of the deployment type chosen, Arcion provides unparalleled flexibility and scalability for data pipelines using MySQL as a data source and having SingleStore as the target destination for the data.
Interested to learn more? Schedule a personalized demo with our database expert today.