The velocity of data in enterprises increases by the day and the demands on that data increase in parallel. The expectation as the data is increasingly generated, distributed, collected, and analyzed, is to extract the maximum business value from it, giving an edge over competitors. To sustain such a high-performance environment, data strategies must be employed to make sure that data is up-to-date, relevant, and actionable.
The choice of database systems used by an enterprise and its inherent capabilities can either help or hinder an enterprise's growth. Upgrading systems that store data or adding additional platforms where data can be analyzed is a common way to help an organization achieve its data-centric goals. In addition to this, techniques such as Change Data Capture, which we will discuss later on in this article, can be utilized to move data to where it is needed, generally in real-time as it is created.
This article will focus on using Change Data Capture (CDC) with Sybase. Before we get into the low-level details we will do an introduction to Sybase and its key features. Then, we will cover the generic usefulness of Change Data Capture as a replication, backup, and migration technique. Lastly, before we cover the exact steps required to implement it, we will break down how Change Data Capture works in Sybase with Sybase Replication Server.
Once we cover the more theoretical Sybase and CDC subjects, we will show a step-by-step guide on how to set up a Sybase and Snowflake Change Data Capture architecture with Arcion. By the end of this article, you will be familiar with Sybase, its key features and components, what Change Data Capture entails, and how to implement Change Data Capture in Sybase. With that, let’s jump right into looking at Sybase and how to implement CDC.
What is Sybase?
An extremely popular software company amongst enterprises throughout the 1990s and into the 2000s, Sybase provided data systems solutions for managing and analyzing information in relational databases at scale. Sybase was acquired by SAP in 2010 and remained a wholly-owned subsidiary until 2012. In 2012, Sybase was fully integrated into the SAP ecosystem.
SAP discontinued the use of the Sybase moniker in 2014, even though the name Sybase is still referenced in materials found online. Sybase had several data products within its product portfolio, including Sybase Adaptive Server Enterprise (ASE), Sybase IQ, Sybase SQL Anywhere, and Sybase Advantage. Sybase was generally used to refer to these lines of products, however, each solved a unique business problem. Since the products have been fully integrated into the SAP platform, Sybase Adaptive Server Enterprise is now called SAP Adaptive Server Enterprise, similarly, Sybase IQ is now called SAP IQ.
Sybase Adaptive Server Enterprise (ASE) was the main enterprise solution that catered to the Online Transaction Processing (OLTP) paradigm. The platform was a row-based storage Relational Database Management System (RDBMS) that was well suited to running reporting and satisfying large storage needs. Sybase IQ on the other hand was the columnar analytics database software that utilized a column-based storage system. This design pattern enabled it to store data in columns where the data came from instead of associating the data in rows. The advantage of Sybase IQ was that it was efficient at producing reports and storing archival data with a much smaller footprint compared to a row-based storage system like Sybase Adaptive Server Enterprise (ASE).
Key Features of Sybase
As was explained in the previous section, Sybase refers to a suite of tools. In this section, the key features of two of those products, Sybase Adaptive Server Enterprise (SAP ASE) and Sybase IQ (SAP IQ), will be reviewed.
Key Features of Sybase Adaptive Server Enterprise (ASE)
- Sybase Adaptive Server Enterprise (SAP ASE) delivers fast and reliable Online Transaction Processing (OLTP). It contains a proven SQL database server that provides high-performance OLTP.
- It is possible to deploy SAP Adaptive Server Enterprise (ASE) as an on-premise solution or in the cloud through an Infrastructure as a Service (IaaS) provider.
- SAP Adaptive Server Enterprise (ASE) is designed to be efficient and brings about lower operating costs through effective resource management.
- It is highly optimized to support large workloads with built-in disaster recovery features and it also provides high availability of data.
Key Features of Sybase IQ (SAP IQ)
- Sybase IQ provides a proven columnar relational database server that is capable of enabling real-time data analytics.
- It is compliant with SQL standards and can be used to query massive volumes of data.
- Sybase IQ (SAP IQ) is especially suited to perform analytics on large workloads and big data. It is scalable and can be deployed on-prem or in the cloud.
What is Change Data Capture (CDC)?
Change Data Capture can be defined as a method used to track and determine data changes in a “source” database. Once a change is detected, the Change Data Capture process will apply these changes to a secondary database or datastore, known as the “target”. Change Data Capture is therefore a technique for replicating real-time database changes from a source system to a target system and keeping the two in sync. The downstream system, or target, will mirror the changes that occurred in the primary system, the source.
The need for data extraction and analytics has made it paramount to have access to the latest data at the time of creation. This is because analytics using stale data can lead to misleading results. With Change Data Capture, data is transported from the point of generation to data warehouses, analytics systems, and business intelligence (BI) platforms for processing and reporting. Change Data Capture enables the replication, migration, and flow of this data from where it is produced to where it is needed.
There are various ways of implementing Change Data Capture in an organization or as part of a data project. Some of the most common ways of implementing Change Data Capture include:
- Using audit columns in databases to track changes to the data.
- Using table deltas to compare the differences between the snapshot of data contained in the database tables.
- Using trigger-based Change Data Capture that captures changes in the data based on events or SQL statements that may have occurred.
- Using a log-based approach where database events are stored in transaction logs that can be replayed.
Some of the main benefits of Change Data Capture include the ability to move data efficiently across a network from a data source to a target, to easily build a real-time data pipeline process, and to keep data in sync across multiple systems as part of a robust replication/backup solution.
What is Sybase CDC?
Change Data Capture can be performed in the Sybase ecosystem through Sybase Replication Server. Replication Server is available to the Sybase Adaptive Server Enterprise (ASE) to easily enable CDC. It is used to implement incremental loading of data by using a standard Extract, Transform, and Load format. This data can be sent to other components downstream. Change Data Capture in Sybase Adaptive Server Enterprise (ASE) can be configured to replicate source tables and to create and drop replication definitions, connections, and subscriptions.
Before the CDC Provider Sybase Replication Server component can be configured, the Replication Server has to be installed and set up correctly. To set up the Replication Server, you must edit the interfaces file to specify the source database, target database, and the Replication CDC Service name. The interfaces file contains network definitions for each SAP Replication Server and data server in a replication system and is a crucial part of the CDC setup.
How Does Sybase CDC Work?
The Change Data Capture component in SAP Adaptive Server Enterprise is usually used in conjunction with SAP Data Services. SAP Data Services is a data transformation tool that has ETL capabilities. The CDC component is the SAP Sybase Replication Server (SRS) which, even though it is great at real-time data replication, cannot transform data on the fly. SRS supports a wide variety of Database Management Systems (DBMS) such as ASE, Oracle, and Microsoft SQL Server.
SRS can be used to perform unidirectional or bidirectional data replication across heterogeneous databases in your enterprise. For example, in unidirectional replication, data transactions are replicated one way; from a primary database to the replicated databases. In bidirectional replication, data transactions are replicated both ways; from a primary database to the replicated databases and vice versa.
Individual tables can be activated for Change Data Capture through the DATA CAPTURE CHANGES attribute. This attribute can be attached to existing tables and interacts with the replication agent. This provides customization and configurability to allow users to specify exactly which data they want to be replicated.
SRS Change Data Capture Architecture
In this section, you will be introduced to Sybase Replication Server (SRS) architecture and how it relates to Change Data Capture. You will see how you can integrate the components discussed in the previous sections to achieve a real-time data replication solution with data transformation. The components that will be a part of this architecture are listed below.
- Source Database - SAP Sybase ASE or other supported DBMS like Oracle, MSSQL, etc.
- SAP Sybase Replication Server
- SAP Sybase Adaptive Server Enterprise (for use as a staging database)
- SAP Data Services
- SAP Sybase PowerDesigner
In the diagram above, the source database, where changes are to be captured, can be Sybase/SAP ASE or another supported DBMS. The replication agent monitors the source database for changes and sends those updates to the SAP Sybase Replication Server. The SAP Sybase Replication Server stores those changes temporarily in a staging database, which in this case is also SAP Sybase ASE. The staging database is used as an intermediary to record changes that have occurred. SAP Data Services utilizes the staging database to retrieve data that are a part of the CDC process and transforms that data for delivery to other target databases. The SAP Sybase PowerDesigner is used to automate the process of generating the scripts associated with the movement of data from one component to the other and also enables the modeling of data on the source database and the staging database. This ensures that only modified data is captured as part of the CDC process. The staging SAP Sybase ASE instance also contains a job scheduler that periodically purges records that have been retrieved by SAP Data Services and are no longer needed. These five components work together to deliver an end-to-end real-time data replication and data transformation solution using tools available within the SAP ecosystem.
How to Setup Change Data Capture with Sybase, Snowflake, and Arcion
Based on the above architecture, using the SAP ecosystem to implement Change Data Capture can be relatively complex. Luckily, Arcion can reduce the complexity of the Change Data Capture architecture and deliver even more scalable pipelines. Arcion is an enterprise Change Data Capture platform that simplifies the process of managing real-time data without compromising speed and scalability.
Arcion has two deployment options, an on-premises option, Arcion Self Managed, and a fully-managed option, Arcion Cloud. Arcion has low latency, is infinitely scalable, and offers guaranteed consistency with zero impact on the source database by using agentless, log-based CDC.
Lastly, Arcion is easy to set up and use through the optimized point-and-click interface which allows users to set up pipelines in a matter of minutes. In the below example, we will take a look at the steps required to create a real-time data pipeline to move data from SAP ASE to Snowflake. First, let’s take a quick look at what Snowflake is for those who may be unfamiliar.
Snowflake is a cloud-based Software as a Service (SaaS) application that provides data warehouse services alongside data storage and analytics. Snowflake supports the Change Data Capture pattern through Snowflake streams. Snowflake is immensely popular amongst data engineers because it is highly scalable, and efficient, and does not require any special installation or maintenance since it is cloud-based and fully managed.
Moving data from SAP ASE to Snowflake, or another big data or analytics platform, is an extremely popular use case. Let’s dive into the steps required to set up a CDC pipeline from SAP ASE to Snowflake CDC with Arcion.
Step 1: Download and Install Arcion Self-hosted
First, we will need to download and install Arcion Self-hosted. This will require a few steps, including downloading Replicant, creating a home directory for it, and adding your license. Each step can be seen in detail by referencing our quickstart guide in our docs.
Next, we need to configure and enable Arcion to connect with Sybase and Snowflake. We will refer to the directory where Replicant was downloaded to as $REPLICANT_HOME in the following steps. Now, let’s go ahead and configure all of the connections with Arcion.
Step 2: Set Up Connection Configuration for Sybase ASE
From $REPLICANT_HOME, navigate to the sample connection configuration file:
If you store your connection credentials in AWS Secrets Manager, you can tell Replicant to retrieve them. For more information, see Retrieve credentials from AWS Secrets Manager.
Otherwise, you can put your credentials like usernames and passwords in plain form like the sample below:
Replace the following placeholders in the example above with:
HOSTNAME: hostname of the SAP ASE server
PORT_NUMBER: port number of the SAP ASE server
DATABASE: the name of the SAP ASE database to connect to
USERNAME: the username of the DATABASE user
PASSWORD: the password associated with USERNAME
After this is complete, save the file.
Note: If you want to use the bcp utility for extracting data from your Source ASE, you’ll need to specify some additional parameters in the connection configuration file. For more information, see Use bcp Utility for Extraction.
Step 3: Set Up Extractor Configuration for Sybase ASE
From $REPLICANT_HOME, navigate to the Extractor configuration file:
Arcion supports both snapshot and realtime modes for SAP ASE. In this example, we will set the configuration to use realtime mode.
Step 3.1: Use realtime mode
First, make sure that the ASE account you specified in the Replicant connection configuration file has the following permissions granted:
After that, you can specify extraction parameters under the realtime section of the configuration file. Below is a working sample:
It’s important to note that the fetch-interval-s parameter determines the interval between each CDC fetch cycle. Always make sure to keep its value above or equal to 10. For more information, see Limitations in the docs.
For a detailed explanation of configuration parameters in the Extractor file, read Extractor Reference.
Step 4: Set up connection configuration for Snowflake
From $REPLICANT_HOME, navigate to the sample Snowflake Connection configuration file:
The configuration file has two parts:
- Parameters related to target Snowflake server connection.
- Parameters related to stage configuration.
For connecting to the target Snowflake server, you can choose between two methods for an authenticated connection:
- RSA key pair authentication
- Basic username and password authentication
In this example, we will simply connect to the server using the basic username and password authentication. For simplicity, we will specify the credentials in a plain form in the connection configuration file as shown below:
In the example above, replace the following with the details about your Snowflake instance:
- SNOWFLAKE_HOSTNAME: the Snowflake hostname. The hostname is in the format ACCOUNT_NAME.REGION_ID.snowflakecomputing.com or ACCOUNT_NAME.snowflakecomputing.com—for example, replicate_partner.snowflakecomputing.com.
- PORT_NUMBER: the port number of the Snowflake host
- WAREHOUSE_NAME: the name of the Snowflake warehouse
- USERNAME: the username to connect to the Snowflake server
- PASSWORD: the password associated with USERNAME
Once these details are added, save the snowflake configuration file.
Step 5: Set Up Applier configuration
From $REPLICANT_HOME, navigate to the sample Snowflake Applier configuration file:
The configuration file has two parts:
- Parameters related to snapshot mode.
- Parameters related to realtime mode.
Step 5.1: Parameters related to realtime mode
To operate in realtime mode, we will use the realtime section in the snowflake.yaml config file opened at the start of Step 5. Below is an example of how to set up the config file for realtime.
When operating in realtime mode, there are a few details that you will want to pay attention to for the pipeline to work correctly:
- Make sure that the number of threads is equal to the number of tables.
- Enable PK/UK logging if the Source table has PK/UK enabled. If the table does not have any PK, then only enable full logging.
- You might want to select any table in your Snowflake dashboard while operating. Due to a Snowflake limitation, problems may arise if the table name contains lowercase. So you need to execute the following command first:
- After executing the preceding command, you can select a table with lowercase names by surrounding the names with double quotation marks.
Now that your Snowflake connector is configured, we can run Replicant, Arcion’s program for running the CDC pipeline.
Step 6: Running Replicant
From $REPLICANT_HOME, run the following command to execute Replicant in realtime mode:
Once the command is executed, Replicant will start up the pipeline. In real-time mode, Replicant first creates the destination schemas if they are not already present. If the destination schemas are present, Replicant appends to the existing tables.
In real-time mode, Replicant starts replicating real-time operations obtained from log-based CDC. By default, real-time mode starts replicating from the latest log position, but a custom start position can be specified by the user in the real-time section of the extractor configuration file.
With this, you have set up a real-time CDC pipeline with Arcion. As changes occur in the source Sybase database, they will be replicated in your Snowflake instance. In a matter of minutes, with no code, we have created a robust pipeline that will enable us to further many use cases, including running real-time analytics through the Snowflake platform.
In this article, we looked at the tools associated with Sybase for Change Data Capture and explored the historical underpinnings of how Sybase became integrated with SAP. We also walked through the pillars and benefits of Change Data Capture and the different approaches to Change Data Capture. We took a deep dive into Sybase CDC with Sybase Replication Server including a look at the architecture of the Sybase Replication Server (SRS) to show how the various components fit together. Finally, as an alternative to the complex SAP replication setup, we looked at a simple step-by-step approach for real-time Sybase CDC using Arcion. Setting up Sybase CDC with Arcion is far easier, scalable, and more reliable than the other methods we reviewed.
SAP ASE is only available with the Arcion Self-hosted version today. To get started with Arcion today, download Arcion Self-hosted for free and see Arcion in action yourself (no payment info required). If you prefer, you can also book a personal demo with our Sybase CDC experts today to get started!