Data has become a valuable asset for organisations around the world, allowing businesses to simplify their operations and streamline their workflow. But its importance goes beyond just running effective operations -- with evidence-based decision-making on the rise, data is now used to inform critical decisions with concrete supporting evidence.
Companies are recognising this trend and leveraging technologies like big data analytics to ensure they have access to the right kind of information when it comes time to make key decisions. Data has quickly become an irreplaceable resource in almost all industries, powering advanced analytical models that can provide actionable insights and predictions.
In this digital age, businesses must use effective strategies to manage copious amounts of data collected daily. Change Data Capture (CDC) has become one of the most relied-upon methods for companies to seamlessly monitor, analyse, and track their data.
In this guide, you will get an understanding of what is Change data capture and how it works. We will also brief you on how you can connect Microsoft SQL Server CDC.
What is Microsoft SQL Server?
Microsoft SQL Server is a Relational Database Management System (RDBMS) developed by Microsoft. Microsoft markets MS SQL Server in different variants to suit the needs of a large, diverse audience. Each of these flavours come at an increasing price and with increased feature availability.
Some editions of Microsoft SQL Server include Standard, Web, Enterprise, Workgroup, and a few others. Microsoft also offers a cloud-based version of SQL Server known as Microsoft Azure SQL Database. Alongside Oracle Database and IBM's DB2, Microsoft SQL Server is one of the leading database solutions, a favorite of enterprises and large organisations. These organisations rely on SQL Server because of its security, reliability, and widespread technical support.
Unlike other relational database solutions, Microsoft SQL Server uses proprietary Transact-SQL (T-SQL) for its SQL operations. T-SQL is a dialect of SQL developed solely for use with Microsoft SQL Server. Within SQL Server there are proprietary data types as well that extend beyond traditional SQL data types. Queries in Microsoft SQL Server are declarative in nature and clearly specify what is to be retrieved, however, the query processor in SQL Server is in charge of optimising queries according to a sequence of actions called a query plan.
An extremely flexible solution, Microsoft SQL Server can be used as the data store for miniature applications running on a single machine to sizeable resource-intensive internet applications serving a massive workload of concurrent users. It is a versatile database solution that can be a good fit for most use cases that require a secure and reliable way to store data. Whether you're deploying the solution on-premise or on a cloud platform, such as Google Cloud, Amazon Web Services, or Microsoft Azure, SQL Server is widely available and supported. This is especially true for those who are working within the .NET and Microsoft development ecosystems
Features of Microsoft SQL Server
Here are some of the key features of Microsoft SQL Server :
- Accelerated Data Recapture: When running high-stakes business transactions on Microsoft SQL Server, it is critical to ensure any operation runs smoothly and safely. Unfortunately, there could be times when something goes wrong or a crash occurs. In such circumstances, it would be necessary to recover the database back to its initial state by undoing all transactions that were disrupted during the error or crash. The process of data recovery used to be tedious and often very time-consuming, taking anywhere from hours to days in some instances. However, Microsoft has revolutionised this process with the introduction of Accelerated Data Recovery in its Microsoft SQL Server. This advanced process utilizes Transaction Logs when recovering databases which drastically reduces the amount of time taken for the entire operation.
- Advanced Encryption: To ensure that sensitive information is kept secure, companies need reliable and efficient protection. Microsoft SQL Server 2016 has implemented a new Encryption Technology called AlwaysEncrypted which provides transparent Column Encryption without Database Administrators having access to the Decryption Keys. This offers advanced security, with encryption even of data within the system.
- Intelligent Query Processing: Microsoft SQL Server is acknowledged for its excellence when it comes to database management. The in-built Query Optimizer of this particular platform is one of the best features because it assists users to get the fastest possible query results. Recently, Microsoft has made strides in continuously improving the performance of this Optimizer with a number of enhancements. This includes Dynamic Memory Grants for Rowstore Tables, Batch Mode on Rowstore, Table Variable Deferred Compilation and more.
- Advanced Analytics: Microsoft SQL Server makes data analysis easy and efficient. With Microsoft SQL Server Analysis Service (SSAS), organisations can leverage data stored in their SQL Server and use it to quickly analyze their performance. SSAS enables users to identify the trends, develop insights for better decision making, and create customised solutions for different processes or purposes.
- Advanced Integrations: Microsoft SQL Server's Integration Services (SSIS) offer a great way for its users to accomplish various tasks including data extraction and transformation, loading, and migration. It is especially user-friendly due to its Graphical User Interface (GUI), which allows individuals with no coding experience to do things like extract data from many different sources such as relational databases, flat files, and XML files without having to write any code.
What is SQL Server CDC ( Change Data Capture )?
Change Data Capture is a software design pattern for databases whereby changes that occur in a source table or database are captured and replicated in another table or database. That means as data is updated in a source system, that change is applied in the target system. Microsoft SQL Server which is one of the leading database solutions that also supports this convention. This makes it a great tool for both database migration and data replication.
Change Data Capture in Microsoft SQL Server is implemented by the SQL Server agent to record INSERT, UPDATE, and DELETE operations as they occur in a table. The details of these recorded changes can be consumed in a relational format. The information in columns and other metadata required to replicate the change is then applied to a target environment, keeping the two separate data stores in sync, automatically.
Why do we use CDC in SQL Server?
The robustness of Microsoft SQL Server means that it is a good candidate for the implementation of an organization’s Change Data Capture strategies. There are several benefits to having Change Data Capture as part of the data migration and data replication strategy used within an organization.
In this section, you will be introduced to some of those benefits as they relate to Microsoft SQL Server. First, Change Data Capture can be used for real time loading of data into a data warehouse. Unlike Extract, Transform, Load (ETL) operations, where one database system is periodically migrating data to another database system, Change Data Capture implemented in Microsoft SQL Server can be used to stream changes and migrate data in a source SQL Server instance to a target system, like a data warehouse, as the changes occur. The data stored in the data warehouse can then be used for data analytics or by other Business Intelligence (BI) applications for the derivation of insights.
The benefit of doing this according to the Change Data Capture paradigm is that any insights gleaned from the data will be up to date. As many know, having real-time data is the key to making insightful decisions when they matter most. A delay in the data being processed could lead to lost opportunities or, in the case of predictive business analytics, completely stunt the use case.
Second, Change Data Capture in Microsoft SQL Server can be used to synchronize the data stored in an organization’s on-premises database with its cloud database. For Microsoft SQL Server, the equivalent cloud database offering - Microsoft Azure SQL Database can be used as the target that closely mirrors the on-prem data, so any updates to the on-prem data is replicated in the cloud. The cloud database can then be used as a data source by other data-intensive applications like reporting or analytics software. Using CDC as a data migration tool can ensure that databases are kept in sync and data quality remains intact, an important part of a data protection strategy.
Third, an audit application can be built into an organization’s workflow using Change Data Capture in Microsoft SQL Server to ascertain when data changed, why it was changed and by who. More will be discussed on Change Data Capture as an audit solution in a subsequent section of this article.
Understanding Microsoft SQL Server Incremental Data Load
Incremental data load can be defined as the process of copying only new data or updated data from a source database to a destination database. The data loaded into the destination database at the time the incremental load process begins includes modified or new records, every other record that is identical in both the source and destination database is ignored.
The main advantage of incremental data load is time savings. It takes less time to load only relevant data compared to setups where data in the destination or target database is completely dropped before the new state in the source database is copied over. By using Microsoft SQL Server as a data loader, it is possible to implement incremental data load through a few different approaches. These include adding timestamp/rowversion datatypes columns, created/modified date and time columns, Change Data Capture or through replication. When using incremental data load with Microsoft SQL Server, it is important to make sure that there is no misidentification of source rows that have changed so that data integrity can be preserved in the target database.
Installation and Architecture
To use Change Data Capture in Microsoft SQL Server, it must be enabled. Previously, Change Data Capture was only available in SQL Server Enterprise edition, however from SQL Server 2016, it can be used across other editions.
Change Data Capture in Microsoft SQL server works by using two SQL Server Agent jobs that are created once CDC is enabled on a database. The first agent is responsible for populating the database tables with the information that has changed while the second agent cleans old records according to the configurable retention policy.
When a database is enabled for Change Data capture, the cdc schema, cdc user, metadata tables and other objects are created for that database. A database can also be checked to know if it is currently enabled for CDC by querying the is_cdc_enabled column in the sys.databases catalog view.
Change Data Capture as an Audit Solution
Change Data Capture can be used in conjunction with Microsoft SQL Server as an audit solution to track changes in a table’s data. The database operations for INSERT, UPDATE and DELETE can be recorded with detailed information on a mirrored table whenever a change is detected. This can be more efficient than setting up individual triggers to write to audit tables.
Enabling CDC in SQL Server will provide an audit trail to see the change in the structure of the data. Additional columns are created in the mirrored table to record a description of the changes. Every INSERT statement has a corresponding record that shows the inserted value, the DELETE statement equally has a record for the deleted value, however, the UPDATE statement has two records. The first record for the UPDATE statement shows the data before the change and the second record shows the data after the update has been performed.
Some of the additional columns that are created as part of the audit functionality are __$start_lsn and __$end_lsn, which show the commit log sequence number assigned by SQL Server Engine for its internal use to track the change, __$seqval, which tracks the order of a specific change in the hierarchy of other changes in the same transaction, __$operation, which identifies the kind of operation (insert, delete, update), etc.
How to Implement CDC in SQL Server : Guide
There are two ways to enable Change Data Capture in Microsoft SQL Server. CDC can be enabled either at the database level or on specific tables. To enable Change Data Capture at the database level, a member of sysadmin fixed server role has to run a stored procedure (sys.sp_cdc_enable_db) in the database context. The content of the stored procedure is shown below.
If you are using SQL Server Management Studio, you can go to View > Template Explorer > SQL Server Templates and then select the change data capture sub-folder to access the templates.
To enable Change Data Capture for specific tables, it needs to have been enabled at the database level. CDC at the table level can be enabled by members of the db_owner fixed database role group. You can run the procedure below to enable it.
In the above procedure, the @filegroup_name parameter is used to specify the filegroup that will contain the change table associated with this instance. By default, the change table is located in the default filegroup of the database.
Change Data Capture can be disabled by a member of the sysadmin fixed server role group by running the stored procedure sys.sp_cdc_disable_db (Transact-SQL) in the database context. The content of the stored procedure is shown below.
Similarly, Change Data Capture can be disabled on a table by a member of the db_owner fixed database role via the stored procedure below.
Auditing DML Changes
After successfully enabling Change Data Capture at the database level and on specific tables, INSERT, UPDATE and DELETE operations get mirrored in the change table alongside values populated in the additional columns created to track the changes. For each Data Manipulation Language (DML) change such as an INSERT, for example, the data stored in the change table will vary according to the type of operation. In the __$operation column on the change table which was discussed in an earlier section, its value will be:
- 1 for DELETE
- 2 for INSERT
- 3 for an UPDATE before a change
- 4 for an UPDATE after a change
The change table will have new records that correspond to INSERT, UPDATE or DELETE operations performed on the source table. That way data in the change table can be queried to get the time a change occurred in a specific record, the type of change, the value associated with the record before and after the change occurred, and the order of the change and the specific columns that were updated.
SQL Server CDC: Pros & Cons
Most design patterns in data and software engineering require trade-offs and Change Data Capture with Microsoft SQL Server is no exception. Let’s take a look at some advantages and disadvantages of using SQL Server Change Data Capture as a strategy in your data stack.
Advantages of SQL Server CDC ( Change Data Capture )
- SQL Server Change Data Capture does not have a requirement that the table to be replicated must have a primary key. Unlike other comparable solutions, this means that tables that do not have a primary key can have their data replicated in a target table.
- The Change Data Capture functionality in SQL Server can be used for Always On Availability Groups in SQL Server because the data captured by CDC is easily reproduced across replicas.
Limitations of SQL Server CDC ( Change Data Capture )
- Even though a Change Data Capture mechanism is native to SQL Server, there is still a significant amount of custom coding required if the changes recorded as part of the replication are to be fed to a data warehouse. That means that plumbing needs to be done to have SQL Server replication work with a connected data warehouse.
- To make sure that data integrity is maintained when using SQL Server CDC, ongoing maintenance of the database must be performed so that changes are not misrepresented, especially for new tables and data types.
In this article, you were introduced to the RDBMS - Microsoft SQL Server, you were also introduced to what Change Data Capture is as relates to SQL Server and its application as an audit solution. You were then shown how to enable CDC to migrate data in SQL Server both at the database and table level. Finally, some advantages and disadvantages of SQL Server Change Data Capture were enumerated to enable you to have a well-rounded understanding if faced with choosing SQL Server as a Change Data Capture solution.
Looking for a more robust CDC solution that easily supports many different sources and targets, including SQL Server? Arcion is one of the best data replication and data migration tools to use when implementing CDC with Microsoft SQL Server. By using Arcion, it’s easy for organizations to build pipelines using SQL Server as a source or destination. Easily move data from SQL Server to or from other platforms such as MySQL, MongoDB, and many other popular databases. Arcion also opens up use cases to efficiently move data into big data platforms such as Snowflake or DataBricks, with no code required. For this particular use case, Arcion is much easier than the built-in CDC supported by Microsoft SQL Server.
Benefits of using Arcion over Microsoft SQL Server CDC include:
- No-code connector support for 20+ sources and target databases and data warehouses
- Multiple deployment types supported across cloud and on-premise installations
- Configuration can easily be done through UI, without any SQL commands, SQL Server configuration changes, or code
- Automatic schema conversion & schema evolution support out-of-the box (including SQL and NoSQL conversion)
- Patent-pending distributed & highly scalable architecture: Arcion is the only end-to-end multi-threaded CDC solution on the market that auto-scales vertically & horizontally. Any process that Arcion runs on Source & Target is parallelized using patent-pending techniques to achieve maximum throughput.
- Built-in high availability (HA): Arcion is designed with high availability built-in. It makes the pipeline robust without disruption and data always available in the target, in real-time.
- Auto-recovery (patent-pending): Internally, Arcion does a lot of check-pointing. Therefore, any time the process gets killed for any reason (e.g., database, disk, network, server crashes), it resumes from the point where it was left off, instead of restarting from scratch. The entire process is highly optimized with a novel design that makes the recovery extremely fast.
Arcion supports many different deployment types, including self-managed on-premise, fully-managed cloud, hybrid, and even multi-cloud deployments. If you’re looking for a flexible on-premise solution, try out Arcion Self-Managed. To get started even more quickly, try out Arcion Cloud and have a scalable and reliable cloud data migration tool set up in minutes.