In a recent webinar on May 24th, Rajkumar Sen, Founder & CTO at Arcion Labs teamed up with Manish Kumar, Sr. Product Manager at SingleStore to provide an overview of how to easily perform database migration from Oracle and MySQL to SingleStoreDB using the Arcion Cloud.
The Rapidly Changing Database Landscape
The First Era: Relational, Transactional Workloads
- Legacy systems (Oracle, MySQL)
- Relational in nature, built for transactional workloads
Data has been around since the early days of computing, but it wasn't until the rise of relational databases that data became a true asset for businesses. Relational databases, such as Oracle and MySQL, are designed to store data in a structured way, making it easy to query and retrieve information. This made them ideal for transactional workloads, such as processing orders or managing inventory.
However, as businesses began to collect more and more data, the limitations of relational databases became apparent. Because they added custom logic for sharding and scaling, they struggled to scale effectively, and their structure made it difficult to handle unstructured data.
In this day and age, there are new use cases for databases that the traditional relational model was not designed to handle. For example, today's applications are more likely to be distributed, meaning that there is a need for horizontal scaling rather than vertical scaling.
In addition, many applications now require real-time data processing, which is not possible with traditional relational databases. As a result, a new type of database has emerged to meet the needs of modern applications.
The Second Era: Faster Database Management, But Not Without Downsides
- MongoDB, Cassandra
- Scale and concurrency
The second era of databases, which began in 2010, was marked by a new breed of databases that were designed to scale and support concurrency. These databases, such as MongoDB and Cassandra, were able to handle large data sets and complex architectures, making them excellent for more modern applications.
There were two major benefits of NoSQL databases:
- Ability to handle large volumes of data: NoSQL databases were designed to be scalable from the ground up. This meant that they could handle large data sets without any performance issues.
- Flexibility: they can be easily scaled to meet the changing needs of a business.
However, one challenge that has emerged is the need to support a variety of data types, as traditional relational databases are designed to work with only a single type of data.
To address this challenge, many NoSQL providers have adopted the use of special-purpose datastores, which are designed to support multiple data types. This approach has proven to be effective in meeting the needs of businesses that require the flexibility and scalability of a NoSQL database, while also supporting the storage of multiple data types.
However, these databases came with their own set of problems. Because they were designed for scale and concurrency, they sacrificed features that made relational databases so useful, such as transactions and SQL querying. In addition, because they were designed to be distributed, they were often more difficult to manage than traditional databases.
Additionally, NoSQL databases do not follow the same relational model. As a result, they do not offer the same level of ACID compliance. This can be a major downside for some businesses, as it can lead to data loss or inconsistency.
As a result, they could not support analytics on large data sets as efficiently as they needed to. However, these databases were able to scale and support concurrency, making them ideal for use in large-scale applications.
The New Era: Digital Transformation and Data-Driven Businesses
Third generation database applications are built to address the challenges of modern database applications. They are SQL in nature but support real-time analytics. Thus, they have the ability to support both transactional and analytical workloads in one single database.
These databases are designed to meet the needs of modern, cloud-based applications that require real-time data access and analysis. They offer many benefits over traditional relational databases, including:
- The ability to scale elastically to meet changing demand
- Support for high availability
- Robust security features
In addition, they are easier to use and require considerably less administration than older database technologies.
Meet SingleStore: A Distributed SQL Database That Powers Data-intensive Applications
SingleStore is a high-performance relational database management system (RDBMS) designed for real-time operational analytics. SingleStoreDB is the company's flagship product. It is a purpose-built DBMS optimized for streaming data and delivering real-time analytics.
SingleStoreDB can handle high volumes of data with low latency and supports a wide variety of data types. It is also scalable and easily deployed on-premises or in the cloud. SingleStoreDB is well suited for applications that require fast data ingestion, complex queries, and real-time analytics.
Data-intensive applications have three requirements:
- High data ingestion
- Low-latency query response for even the most complex queries
- High concurrency transactions on large data sets
Unlike relational databases, SingleStore is a distributed SQL database that can be scaled elastically to meet the needs of modern applications. It is designed to handle high-throughput and low-latency workloads by distributing data and query processing across multiple nodes.
SingleStore is also highly scalable, allowing it to accommodate growth in data volume and query complexity. In addition, SingleStore offers a number of other features that make it an ideal choice for data-intensive applications, including real-time analytics, cloud integration, and support for hybrid transactional/analytical processing (HTAP).
What Makes SingleStore So Powerful?
SingleStore is designed for real-time transactional and analytical workloads, so it supports a wide range of application types, including e-commerce, web applications, financial analysis, and more. It is built on a column-oriented architecture that allows it to process large amounts of data very quickly.
Its most prolific benefits include:
- Patented universal storage technology
- Optimized to give the best performance
- Row store for OLTP workloads
- Column store for analytical workloads
- Supports multiple data types (key/value pair, JSON, time-series)
- Well-suited to either replace or augment modern databases, such as Redis, MySQL, Snowflake
How To Migrate to SingleStore From Oracle and MySQL?
At the enterprise level, most companies have numerous different database technologies in use. As a result, they often need to migrate data from one database type to another. This can be a complex and time-consuming process, as it requires significant planning and effort to ensure that data is migrated accurately and completely.
Data migration from Oracle, MySQL, and other legacy systems is daunting for several reasons:
- Extensive downtime: If not done correctly, downtime can cost a company dearly in terms of both money and customer confidence.
- Data loss: Incomplete or incorrect data migration can lead to data loss, which can be irrecoverable.
- Business interruption: If the migration process is not carefully planned, it can disrupt business operations, leading to lost productivity and revenue.
- Cost and complexity: Database migration projects typically take between 6-12 months, and can cost substantial amounts of money.
- Excessive developer resources spent on manual work: Developers spend months doing schema mapping, procedure mapping, and writing custom code that can take months or years to execute.
- Complex bi-directional setup required for fallback: Bi-directional data replication is required for complex database migration projects so it provides a fallback option when things go wrong.
This combination of cost and complexity introduces a tremendous amount of risk to the data migration project. This is where Arcion comes in.
Arcion Mitigates Database Migration Challenges
Zero downtime database migration
Data is the lifeblood of any business, and downtime can be costly in terms of lost productivity and revenue. That's why zero-downtime migration is essential for minimizing disruptions when moving data to a new platform.
Change Data Capture (CDC) is the most effective technology that can help to achieve this by replicating data in real-time, with low latency. Change Data Capture uses a log-based approach to track changes to data, making it easy to keep the source and target systems in sync. As a result, businesses can migrate data with confidence, knowing that their operations will not be interrupted.
Reduced cost and complexity
If you have a database with lots of schemas and tables, you would typically need a DBA or a set of developers to write scripts in order to migrate the data to a new schema. This creates a massive workload, making schema conversion less cost-effective and more time-consuming.
However, Arcion is built with automatic schema conversion and schema evolution support out-of-the-box, which makes the migration process much easier and less time-consuming. In addition, Arcion's continuous validation feature ensures that there is zero data loss during the migration. As a result, Arcion can reduce migration budgets and timelines by a minimum of 90%.
When you move a set of data from a legacy database like Oracle to a modern database like SingleStore, you need to have a clear path for fallback in the event of an issue.
Arcion supports bi-directional replication to allow fallback anytime. This will keep your data safe and consistent across both systems, while also giving you the flexibility to use the features of the new system without having to worry about losing any data.
How Does Arcion Handle Migration to SingleStore?
Change data capture (CDC) is a very popular and effective way to replicate data across databases. But Arcion took the concept of Change Data Capture and made the transactional connectivity across various databases easier, massively scalable, and highly available.
Arcion starts by reading the transaction logs of the source database (Oracle, MySQL). It then uses a proprietary process to extract only the changed data from those logs. From there, the changed data is stored in an optimized format in the target database.
The product continuously monitors the source database for changes and applies them to the target database in near real-time. This allows for a super-low-latency, always-up-to-date replica of the source database.
In addition, if the source database is down for any reason, the product can automatically failover to a standby database, ensuring that data is always available.
Three Pillars of Change Data Capture
Any database migration solution needs to answer three key questions and perform well in all these areas: transactional integrity, scalability, and low latency.
Transactional integrity is critical because it helps to ensure the accuracy and completeness of the data. Incomplete or inaccurate data can lead to errors in decision-making, and it can also jeopardize the security of the organization's information. By ensuring transactional integrity, organizations can be confident that their data is complete and accurate, and that it can be used safely and effectively.
Thanks to Arcion's proprietary zero-data-loss architecture, data is always accurate and complete. By taking periodic snapshots of the data being migrated, checkpointing allows for automatic restart if a failure occurs. This can be useful when migrating large amounts of data, as it minimizes the risk of losing any information.
Since data is captured with Arcion in real-time, organizations that rely on transactional data don't need to worry about any missed transactions. Because the software processes and delivers each bit of transactional data exactly once, so it doesn't impact the source database's performance.
In addition, Arcion's algorithms guarantee that the order of operations in SingleStore matches the complex order of operations in the source database (Oracle, MySQL).
When replicating relational databases, one of the challenges that data replication vendors have not yet solved is how to handle schema changes, including added or dropped columns and changed data types. This is a critical issue, as schema changes are common and can lead to data inconsistencies and pipeline downtime if not handled properly.
Arcion solves this problem by supporting schema changes out-of-the-box without any manual intervention. This is double-checked with the Data Validation Tool, meaning that there is no need to manually edit the replication configuration or write custom code to handle the change and still ensuring data consistency. As a result, Arcion provides a much more streamlined and effective way to replicate relational databases.
Scalability is critical because, with large databases contains terabytes or petabytes of data, traditional approaches simply cannot keep up. They require a lot of manual tuning and often hit brick walls when they reach a certain size.
This is where scalability becomes critical. With Arcion's scale-up capabilities, businesses can take advantage of both inter-table parallelism and intra-table parallelism. This means that the product can replicate multiple tables in parallel as well as replicate a single table across multiple servers in parallel. Best of all, these processes happen with no additional input from the user on an as-needed basis.
Scale-out capabilities allow businesses to run multiple clusters of nodes in Arcion at once, enabling them to handle arbitrarily large amounts of data at once with limitless scalability.
The self-hosted version of Arcion contains a cluster management framework that manages the orchestration and availability. The cloud product is fully-managed, and all these concerns are handled behind the scene by Arcion so you don't have any DevOps overhead.
At its core, the goal of any modern platform is to get data from source to target in absolute real-time. Change Data Capture-based replication ensures that data logs are replicated as soon as they are changed at the source. They are then extracted and loaded in parallel, providing instantaneous updates at the target.
Furthermore, this process happens without having any access to the source database itself. This helps to ensure that the target database is always up-to-date, providing accurate information for downstream applications while not overloading the source database and impact its performance.
When working with databases, it is often necessary to convert data from one format to another (think SQL to NoSQL). This can be a time-consuming and error-prone process, particularly when working with large data sets. With in-memory format conversions, the changing and encoding of data happen quickly at very high throughput.
Additionally, Arcion supports state-of-the-art data formats like Parquet for faster ingestion, allowing for the instant yet seamless real-time delivery from Oracle or MySQL to a high-performing target like SingleStore.
Data delivery in real-time completely changes the game for organizations. Companies used to have to choose between accuracy and latency. With Arcion, you can have both.
See Arcion Cloud in Action
Watch the recorded live demo below from Arcion’s Founder and CTO, Rajkumar Sen, to ingest 10M rows of TPCH schema from Oracle to SingleStore (in a matter of minutes).
Raj then made an update on 10k+ rows on Oracle using log-based Change Data Capture, and saw near-instant changes reflected on SingleStoreDB. Finally, Raj consolidated 10M rows from Oracle and 1M rows from MySQL into SingleStoreDB using Arcion Cloud. He performed transactions on Oracle and MySQL tables, and saw those transactions replicated in real-time.