In the dynamic and ever-evolving domain of database technologies, businesses, and their developers always seek options that fuse powerful capabilities with adaptability and efficiency. Every year presents a fresh cohort of new database technologies for on-premise and cloud-based platforms. Classic systems like MySQL have been strong pillars in both on-premise and cloud deployments, yet the relentless advancements in cloud technology have paved the way for promising contenders such as Google's Cloud SQL.
Transitioning to new technology often means deciding on migrating or replicating existing data across platforms, a process that usually comes with complexity and challenges. As one navigates the terrain of shifting data from MySQL to CloudSQL, a strategic blueprint and a comprehension of the intrinsic architectures are crucial. The utilities enabling these transitions range from effortlessly automated to difficult and manual. Picking the proper data migration and replication strategy becomes as pivotal as the foundational database systems.
This guide delves into MySQL and Cloud SQL, highlighting their characteristic attributes and offering various strategies to transport data between the platforms. Below, we will explore two unique techniques for migrating and replicating data from MySQL to a Cloud SQL instance. Initially, we will use Arcion as a seamless, automated data migration and replication method. Following that, we will examine a more traditional, hands-on manual process. Before delving into migration specifics, let's first get acquainted with the source and destination database platforms we will be using, MySQL and Cloud SQL.
Introduction to MySQL
MySQL is one of the most popular open-source Relational Database Management Systems (RDMS) in existence. The platform runs on a client-server model with numerous features ideal for small and large applications. Since MySQL is open-source, anyone can download the software and modify the source code to meet their business needs.
MySQL’s popularity is driven by its high performance, reliability, and the fact that it is very easy to use and deploy. Due to its ease of use, it became a staple in one of the most popular web app stacks used across the internet: the LAMP stack. It is compatible with many computing platforms such as Microsoft Windows, macOS, Linux, and Ubuntu and supports connectivity through languages like PHP, Python, Java/JDBC, C++, C, ODBC, Perl, and Ruby. It also supports open-source applications like Drupal, Joomla, Magento, and other popular CMS platforms, with the most popular CMS use case being WordPress. Battle-tested as one of the most popular database solutions for many of the world’s most extensive web applications, it has become a go-to for prominent organizations such as Facebook, Twitter, Netflix, Uber, and others with large volumes of data and users accessing their applications.
MySQL also has variations such as MySQL Heatwave, a fully managed service that enables customers to run OLTP, OLAP, and machine learning workloads directly from a MySQL Database. Also available is MySQL Enterprise Edition, with its complete set of advanced features, management tools, and technical support. Others include MySQL for OEM/ISV, and MySQL Cluster CGE used for web, cloud, and communications services.
Key Features of MySQL
- Open-source: MySQL is free and open-source; therefore, it can be modified through its source code to handle your basic requirements.
- Client/Server Architecture: MySQL is based on a client-server model where the database server (MySQL) can query and make changes with many clients (multiple applications and programs).
- Scalable: MySQL is developed to scale quickly to meet your business needs, be it a small or large amount of data, clusters of machines, etc. It can handle almost any amount of data.
- Character Sets: MySQL supports different character sets such as Latin 1 (cp 1252 character encoding), German, Unicode characters, etc.
- Data Types: MySQL contains multiple data types such as unsigned integers, signed integers, text, date, time, datetime, timestamp, blob, FLOAT, DOUBLE, character (CHAR), variable character (VARCHAR), and so on.
- Large Databases: MySQL supports large databases of up to about 40 to 50 million records, 150,000 to 200,000 tables, and up to 5,000,000,000 rows.
- Client and Utility Programs: MySQL contains many client and utility programs such as mysqladmin, MySQL Workbench, etc.
- Compatible With Many Operating Systems: MySQL runs on many operating systems like Microsoft Windows, macOS, Linux, varieties of UNIX, Nivell Netwave, and many more.
- MySQL Community: MySQL has a devoted community of developers that one can access to tackle issues and troubleshoot.
Introduction to Cloud SQL
Introduced by Google in 2011, Cloud SQL has become a key part of the Google Cloud Platform and is the mainstay in the Google Cloud SQL offering. Cloud SQL offers a fully managed database service for relational databases, including PostgreSQL, MySQL, and Microsoft SQL Server, similar to Amazon RDS. Since its inception, Cloud SQL has seen significant updates and enhancements, aligning itself with the ever-changing demands of cloud computing. Its ease of use, scalability, and security features have made it a preferred choice for many organizations looking to migrate or set up their databases in the cloud.
Running PostgreSQL on Cloud SQL blends the robust functionality of PostgreSQL with the flexible benefits of a cloud environment. The platform's seamless integration with PostgreSQL allows developers and database administrators to leverage all standard PostgreSQL functionalities without the burden of infrastructure management. Since everything is hosted in the cloud, Google Cloud automates tasks like backups, updates, and maintenance to give developers more time to focus on what matters: building and optimizing their applications.
The scalability aspect of Cloud SQL is particularly appealing to businesses that experience fluctuating workloads. It supports vertical and horizontal scaling, and the presence of read replicas further enhances the system's efficiency. Running PostgreSQL on Cloud SQL also ensures that the PostgreSQL instance is equipped with data encryption, private networking, and compliance with various industry standards, enhancing overall security.
Cloud SQL's tight integration with other Google Cloud services, such as App Engine, Compute Engine, and Google BigQuery, creates a Google-backed ecosystem that allows for easy operation across different parts of your applications. Cloud SQL combines traditional SQL capabilities with modern cloud-based benefits and provides a powerful and streamlined platform tailored to today's data-driven needs.
Key Features of Cloud SQL
Below are a few key features that make a Google Cloud SQL instance an attractive offering when running databases in the cloud.
- Fully Managed Service: Google handles patch management, backups, updates, and failover for hassle-free database management.
- Scalability: Automatic scaling, both vertically and horizontally, allows easy adjustments per the application's needs.
- High Availability: Ensures high availability through automatic backups, replication, and failover in different regions.
- Integrated Monitoring & Logging: Integrates with Google's monitoring and logging tools, providing a clear overview of database performance.
- Data Encryption: Data is encrypted at rest and in transit, ensuring security.
- Compliance: Compliance with HIPAA, PCI DSS, and GDPR standards.
How to Migrate Data from MySQL to Cloud SQL
Implementing an effective data pipeline is crucial for a smooth migration when migrating data from MySQL to a Google Cloud SQL database, specifically PostgreSQL running on Cloud SQL. This section will explore two main ways to migrate and replicate data between these platforms. First, we will look at using Arcion to move data from MySQL to Cloud SQL in an automated fashion, taking advantage of the platform's advanced capabilities and automation features. Secondly, we will examine the manual migration approach, utilizing native tools and commands built into the platforms. This hands-on method provides control and flexibility but requires a comprehensive understanding of MySQL, CloudSQL, and PostgreSQL. Let’s begin by Looking at Arcion.
Method 1: MySQL to Cloud SQL Using Arcion
Arcion is the leading real-time, in-memory Change Data Capture (CDC) solution that offers users massive scalability and data consistency. It is suitable for migrating and replicating data across various enterprise databases and data warehouses. This section will look at how to connect Postgres to Cloud SQL (MySQL) using the Arcion UI. This can be done in Arcion Cloud or Arcion Self-Hosted. The following functionality is also available via CLI, although this blog does not cover it.
Create a New Replication
First, we must log into Arcion to set up our replication pipeline to enable CDC. For Arcion Cloud, you can sign up and log in here.
Next, we will select our replication mode and write mode. A few options are available to suit your needs. For replication modes, Arcion supports:
- Snapshot (the initial load)
- Realtime (CDC)
- Full (snapshot + CDC)
For write modes, Arcion supports:
We will select the Replication mode as Full and the Write mode as Truncating. You will also see that I have named the replication “MySQL to Cloud SQL PostgreSQL”. Once the Name is populated and the Replication and Write Modes are selected, click Next at the bottom of the screen.
Configure the Source PostgreSQL Connector
We are then brought to the Source screen. From here, we will click the Add your own source button.
We then will select MySQL as our source, and then, scroll to the bottom of the page, and click Continue.
Now, we can add our Postgres instance details. These details include:
- Connection Name
All other fields will default. For Username and Password, you will need to ensure that the user has the correct permissions in Postgres to ensure it can read the necessary data.
With our data source correctly configured, we will click the Test Connection button to ensure everything works as expected.
The results should look like this once the test is finished running. Finally, you can click the Done button to exit.
Once the connection is saved and tested, click Continue. Next, we will want to pull in the schema from the database. Arcion will prompt us to click the Sync Connector button on the next page. Click the button, and Arcion will connect to our PostgreSQL instance and pull down the schema. Once completed, the UI in Arcion will display the retrieved schema. Then, we will click Continue in the bottom right corner of the screen to move to the next step.
With our sync successful, we can now click Continue to Destination in the bottom right corner of the screen to move to the steps where we set up our destination.
Configure the Destination Cloud SQL PostgreSQL Connector
On the Destination screen, we will click Add your own destination to start the setup of our Cloud SQL connector.
Then, select Cloud SQL PostgreSQL as your Connection Type and click Continue.
On the next screen, input your connection details. These details include:
- Connection Name
All other fields will default. For username and password, we will require a user that can connect and write to our Cloud SQL PostgreSQL instance.
Once you enter all the details and click Test connection, you’ll see that the Cloud SQL connector can connect to the target PostgreSQL database.
Configure the Filter
Now, after clicking Done on the connection test, we can click Continue to Filter to begin the Filter configuration for our pipeline.
On the Filters screen, we will check the Select All checkbox to replicate all our tables and columns from the source to the destination.
Start The Replication
After clicking Save to save the filter, click Start Replication.
Once the initial data is loaded, the pipeline will continue to run, monitor for changes, and apply those changes to the destination. The idle pipeline will still show RUNNING in the top right of the screen but will show a row replication rate of 0 until new data is written to the source. You’ll also notice that the Phase description of the pipeline will now show Change Data Capture instead of Loading Snapshot Data.
Since we are using Full-mode replication, If we start adding data to the MySQL instance, we will see that Arcion detects this and then syncs that data to CloudSQL in real-time.
Of course, we don’t require CDC to be enabled to handle the initial migration, and we could use Snapshot mode. An advantage of enabling CDC during the migration is that CDC can make migrations easier by allowing for zero downtime migrations. The old and new databases can stay in sync until all applications and services using the old database are moved to use the new PostgreSQL instance hosted on Google Cloud SQL.
Benefits of Using Arcion
As mentioned, MySQL and PostgreSQL hosted on Cloud SQL are fully supported by Arcion. The steps required to enable data migration from MySQL to Cloud SQL, discussed in previous sections of this article, can be more conveniently carried out with Arcion. This can be done through a simplified and intuitive UI or the CLI. For example, you can set up a streaming pipeline in Arcion using Change Data Capture to replicate changes from MySQL to any of the other destination databases supported, like Cloud SQL, or others like Oracle, DB2, SAP (ASE, HANA, IQ), Snowflake, and many more. Below are some highlights of using Arcion for data migrations and Change Data Capture.
Many other existing CDC solutions don’t scale for high-volume, high-velocity data, resulting in slow pipelines and delivery to the target systems. Arcion is the only distributed, end-to-end, multi-threaded CDC solution that auto-scales vertically & horizontally. Any process that runs on Source & Target is parallelized using patent-pending techniques to achieve maximum throughput. There isn’t a single step within the pipeline that is single-threaded. Arcion users get ultra-low latency CDC replication and can keep up with the ever-increasing data volume on Source.
100% Agentless Change Data Capture
Arcion is the only CDC vendor that offers 100% agentless CDC to all its supported enterprise connectors. The agentless CDC applies to all complex enterprise databases, including MySQL and Cloud SQL. Arcion reads directly from the database transaction log, never from the database itself. Previously, data teams faced administrative nightmares and security risks associated with running agent-based software in production environments. You can now replicate data in real-time, at scale, with guaranteed delivery — but without the inherent performance issues or the security concerns.
Data Consistency Guaranteed
Arcion provides transactional integrity and data consistency through its CDC technology. To further this effort, Arcion also has built-in data validation support that works automatically and efficiently to maintain data integrity. It offers a scalable data migration and replication solution while ensuring zero data loss has occurred.
Automatic Schema Conversion & Schema Evolution Support
Arcion handles schema changes out of the box, requiring no user intervention. This helps mitigate data loss and eliminate downtime caused by pipeline-breaking schema changes by intercepting changes in the source database and propagating them while ensuring compatibility with the target's schema evolution. Other solutions would reload (re-do the snapshot) the data when there is a schema change in the source databases, which causes pipeline downtime and requires a lot of compute resources, which can get expensive!
Pre-Built Enterprise Data Connectors
Arcion has a library of pre-built data connectors. These connectors can support many of the most popular enterprise databases, data warehouses, and cloud-native analytics platforms (see complete list). Unlike other ETL tools, Arcion provides complete control over data while maintaining high automation. Depending on your use case, data can be moved from one source to multiple targets or multiple sources to a single target. This means that if you branch out into other technologies outside of MySQL and Cloud SQL, you’ll already have the capability within Arcion to handle your new sources and targets without the need for another pipeline technology.
Method 2: MySQL to Cloud SQL Manually
For those who want a more manual approach, this method involves exporting data from MySQL, transforming it into a PostgreSQL-friendly format, and importing it into a PostgreSQL instance hosted on Cloud SQL. Below is a step-by-step guide using native tools for manually migrating data from MySQL to Cloud SQL.
To migrate data from MySQL to a PostgreSQL instance hosted on Google Cloud SQL, you’ll need the following prerequisites:
- The mysqldump command-line client is installed on the machine where your MySQL instance is hosted.
- The pgloader tool is installed on the machine where your MySQL instance is hosted.
- The Google Cloud SDK (`gcloud` command-line tool) is on the server where your MySQL instance is hosted.
Step 1: Backup your MySQL Database
First, we will use the `mysqldump` tool to back up your MySQL database. This backup will then be used as the input for loading your data into the Postgres instance. Below is an example command that uses mysqldump to export data:
mysqldump -u [USERNAME] -p[PASSWORD] --databases [DATABASE_NAME] > backup.sql
Step 2: Set up a PostgreSQL Instance on Google Cloud SQL
Next, we will create our PostgreSQL instance on Google’s Cloud SQL platform. To do this, you can follow the steps outlined here. After following these steps, your Postres instance will be ready, and you can move forward with loading the MySQL data into it.
Step 3: Install `pgloader`
After our PostgreSQL instance is created, we will load the MySQL data into it using pgloader. First, you’ll need to make sure you have installed pgloader. If you're on macOS, you can use Homebrew:
brew install pgloader
For other OS, check the official pgloader installation documentation.
Step 4: Migrate the Data Using `pgloader`
Once pgloader is installed, we will use pgloader to convert the MySQL dump to a PostgreSQL format and load it into the Postgres instance on Cloud SQL. Below is an example of how the tool can be used in the command line. Of course, replace the placeholders in the command with your actual credentials:
pgloader mysql://[MYSQL_USERNAME]:[MYSQL_PASSWORD]@[MYSQL_HOST]/[MYSQL_DATABASE_NAME] \
Step 5: Verify the Migration
The last step in the migration process is to verify that the data was successfully migrated. To verify the migration, connect to the PostgreSQL instance on Cloud SQL using the psql command or any other PostgreSQL client. Below is an example command using psql to connect to the PostgreSQL instance.
psql -h [Cloud SQL_PUBLIC_IP] -U [POSTGRES_USERNAME] -d [POSTGRES_DATABASE_NAME]
Once logged into your Postres instance, run a few SQL queries to ensure your data has been migrated correctly. For example, run a simple SELECT statement that counts the rows to ensure it matches your expectations.
SELECT COUNT(*) FROM [YOUR_TABLE_NAME];
Step 6: Optimize and Tune your PostgreSQL Instance
After migrating, you can do some tests to ensure that the PostgreSQL database is optimized for your workload. This might involve:
- Running `VACUUM` and `ANALYZE` operations to update statistics.
- Reviewing and possibly tuning PostgreSQL configuration settings, such as `work_mem` and `shared_buffers`.
Following the steps above, you’ll have successfully migrated your MySQL database to a PostgreSQL instance hosted on Google's Cloud SQL. It's crucial to monitor the performance of your new database and adjust configurations as needed. Retesting your application functionalities to ensure they work seamlessly with your new PostgreSQL instance might also be a good idea.
Limitations and Challenges of Using Custom Code Method
Even though manually moving data from a MySQL database to Cloud SQL with custom code is possible, it does come with some significant limitations and challenges. Below is a short list of some limitations and challenges impacting this method.
The manual method described above requires careful handling and understanding of MySQL, PostgreSQL, and Cloud SQL. This includes detailed knowledge of SQL dialects, indexing, stored procedures, and more. The absence of automated tools increases the complexity, making this method more suited to seasoned database professionals.
Differences in data types, constraints, and encoding between MySQL and PostgreSQL might lead to compatibility issues. Translating these differences manually may require significant effort and expertise, especially when migrating complex or non-standard data structures.
This process can be labor-intensive and time-consuming, especially with large datasets. Each step, from data export, conversion, and importing into Cloud SQL, must be executed carefully, and any unforeseen issues can lead to further delays.
Any changes or updates to the schema must be managed manually, increasing maintenance effort. Unlike automated solutions, the manual method does not offer continuous synchronization or real-time updates. In ongoing migration scenarios, this can cause misalignment between the source and target databases. The impacts of these issues may not be observed until it’s too late.
Without proper tools, error handling can be complex and lead to inconsistent data. A small mistake in data conversion or misinterpretation of data types can create anomalies within the migrated data. Tracking and rectifying such errors may require extensive troubleshooting and validation.
These challenges underline the importance of meticulous planning and execution when opting for the manual migration method from MySQL to Cloud SQL. While it provides greater control and customization, it also demands a high level of technical skill and attention to detail. It’s apparent from these challenges that automated tools offer a massive advantage over these native, legacy approaches.
Migrating data from MySQL to Cloud SQL is an important task that can be accomplished using specialized tools like Arcion or a manual, hands-on approach. Understanding the key features of MySQL and Cloud SQL and the specific migration requirements is essential in choosing the proper method. Careful planning, consideration of potential challenges, and leveraging available resources can lead to a smooth and successful migration, ensuring data integrity and availability in the new environment.
As we saw, Arcion is a real-time, in-memory Change Data Capture (CDC) solution that guarantees scalability and ensures data consistency when moving data from MySQL to a PostgreSQL instance hosted on Google’s Cloud SQL platform. To quickly implement a data integration strategy, get started with Arcion today by connecting with our team of MySQL and Cloud SQL migration and replication experts. CDC pipelines built with Arcion are easy to build, maintain, and scale and are production-ready in minutes.