Implementing CDC with Salesforce (Everything You need to know)

Luke Smith
Enterprise Solutions Architect
September 27, 2022
Matt Tanner
Developer Relations Lead
September 27, 2022
Matt Tanner
Matt Tanner
Developer Relations Lead
Developer Relations Lead
January 25, 2023
15
 min read
Join our newsletter

Data generated in many organizations today is used as the foundation of the decision-making process. This data can help propel the organization toward achieving its business goals and to develop its roadmap for the future. It is no longer enough to only collect and store data from business processes and products. That data is expected to be analyzed and to uncover insights to help the business excel. To achieve this, data must be readily available across an organization. It must be available from the point where it is generated to the point where it is used or analyzed. For example, data on customer feedback about products and services are vital because it can help finetune those products and services to better serve the target audience using a metric-driven approach. Customer relationship data is also important since customers now expect to deal with companies that share a part of their value system. 

The way a company or organization interacts with its customers plays a prominent role in the success it achieves in the wider market. This has led to the development of specialized software that manages the entire customer journey and helps maintain ongoing relationships with customers. Customer Relationship Management (CRM) software is an indispensable tool in any organization's workflow that helps to attract prospective customers and retain loyal users. One of the largest and most trusted Customer Relationship Management software platforms on the market is Salesforce.

In this article, you will learn how to enable Change Data Capture in Salesforce. Change Data Capture is a data design pattern where changes in data in a source location trigger replication of that data in a target location, keeping the two data sources in sync. With Salesforce and Change Data Capture, you will be able to synchronize changes in data as they occur via streaming on Salesforce.

Table of Contents

What is Salesforce?

Salesforce is a suite of cloud-based offerings that offer Customer Relationship Management (CRM) tools that can be used to track leads, manage communications between customers and companies, process orders, integrate data from external systems, prepare and visualize reports, and much more. Some Salesforce product offerings include Sales Cloud, Service Cloud, Marketing Cloud, Commerce Cloud, and Salesforce Platform. Salesforce does not require any software to be installed or managed since it is a cloud solution. Salesforce users can therefore scale their operations on the platform as their business grows without having to worry about maintaining or updating software or hardware dependencies. 

Salesforce also has a large marketplace for third-party application solutions that are integrated into the platform. Users can browse through the AppExchange marketplace to get applications that extend the capabilities of the base platform and serve specific needs. Specialized consulting partners are also readily available to guide businesses to craft a custom business strategy which specifically addresses their business use case. Another advantage of Salesforce is that it has extensive documentation that helps both new and experienced users to better understand the platform. It even manages a free online platform - Trailhead, that showcases courses about Salesforce products.

What is CDC in Salesforce?

CDC can be used to keep data in sync across multiple stores that rely on the Salesforce data. Many times, Salesforce will be integrated with other systems throughout the organization. These other systems may rely on the most up-to-date data in order to perform as intended. This is where you’d want to use CDC within your Salesforce setup.

With Salesforce CDC, a change event is created whenever a Salesforce record is created, updated, deleted, or undeleted. The change event will contain details about what happened to the object including any new or changed fields and any header fields that contain information about the change.

There are many ways to implement CDC in Salesforce which include some native solutions (with limitations), such as its native streaming capabilities. More flexible options also exist which include solutions such as Arcion, which give massive amounts of flexibility and scalability over traditional solutions. The built-in CDC mechanisms in Salesforce also require a lot of knowledge and configuration to work as intended, and even then still have major limitations over other solution options.

How to implement CDC with Salesforce

Data in Salesforce can be synchronized with external systems through Change Data Capture (CDC). This can be beneficial for analytics or to store updated data in a data warehouse. Change Data Capture in Salesforce is implemented through an event-driven architecture. When data is changed in Salesforce, through a create, update, delete or undelete operation, a change event is published to the Salesforce event bus. From here, the event can be consumed by subscribers of that channel. The architecture can be seen as a Publisher/Subscriber model. Changes in a Salesforce record trigger a notification to all subscribers that desire to know about such changes. This is preferable to the alternative pull technology approach where a client periodically sends queries to the server asking for recent changes in data. This approach is a much less efficient way to replicate Salesforce data.

The streaming of change events in Salesforce means that data is synchronized in near real-time. Live data is then available in downstream target systems that are integrated with Salesforce. This approach is also scalable since, instead of exporting snapshots of data at specific intervals, only the data that has changed gets sent to the event bus to be consumed by subscribers. Naively exporting data to capture changes is not a viable solution as the volume of data grows, however, streaming change events conserves bandwidth and is the approach that will be utilized in this article.

How do I enable CDC in Salesforce?

Change Data Capture in Salesforce can be used on Standard objects found on the Salesforce platform and also on custom objects created by users of Salesforce. As explained in the previous section, CDC in Salesforce uses events. Therefore, objects in Salesforce must be enabled for change notifications. Subscribing to a channel that corresponds to a Salesforce object is not enough to enable Change Data Capture. The object of interest must also be enabled for change notifications. If you want to enable CDC on a custom object, you have to create it first. In this example, you will enable it on a Standard object, called Account. To do so, follow the instructions below.

  • Sign in to Salesforce and go to Setup via the gear icon at the top right corner.
  • In the Quick Find box, search for “Change Data Capture” and click on the top result. You should find it under Integrations.
  • Select the entity you want to enable change events on. For this example, select Account.
  • Click on the forward arrow to move it to Selected Entities.
  • Click on Save.

Change Data Capture is now enabled on the object. If you subscribe to the associated channel, you will receive notifications.

Steps To Set up & Subscribe To Salesforce Change Data Capture

Salesforce allows users to subscribe to change events through CometD, Pub/Sub API, or through Apex triggers. In this example, you will use the EMP Connector, a thin wrapper around CometD messaging library, to listen to events.

Once the EMP Connector has been set up, you can listen to event channels that send change events. Channel names usually correspond to the name of the object (entity) and it uses the following convention.

/data/ChangeEvents - This channel listens for all changes in entities.

/data/<ENTITY>ChangeEvent - This channel is used for Standard objects. The variable <ENTITY> should be replaced by the entity of interest. For example, the channel for the Account entity will be /data/AccountChangeEvent

/data/<CUSTOM_ENTITY>__ChangeEvent - This is used for custom entities. A custom entity Employee will have a corresponding channel /data/Employee__ChangeEvent.

Next, you will look at the structure of a change event message. A change event message is a JSON object with several fields and values. The payload field contains the ChangeEventHeader field which has key-value pairs of records associated with the change event. For example, ChangeEventHeader.changeType specifies the type of operation that took place, create, update, delete and undelete. Below is the full structure of a change event message.

{  
 "schema": "",   
 "payload": {    
  "ChangeEventHeader": {       
   "entityName": "...",       
   "recordIds": "...",       
   "changeType": "...",       
   "changeOrigin": "...",       
   "transactionKey": "...",       
   "sequenceNumber": "...",       
   "commitTimestamp": "...",       
   "commitUser": "...",       
   "commitNumber": "...",       
   "changedFields": [...]    
  },    
  "field1": "...",   
  "field2": "...",   
  . . .  
 },   
 "event": {    
 "replayId": 
 }
}

Now that you understand the channels associated with entities and the structure of a change event message, it is time to install EMP Connector and use it to subscribe to events.

First, if you don’t have them installed, you will need the following dependencies before you begin the steps below.

  • Git
  • Apache Maven
  • Java Development Kit 8 or later

Next, you will need to clone the EMP Connector repository from Git. In a terminal and run the following command to clone and pull down the repository:

git clone https://github.com/forcedotcom/EMP-Connector.git

Next, we will need to build the EMP Connector tool. Firstly, make sure that you are in the EMP-Connector root directory. If you are still in the same directory that you cloned the project to, you will need to run:

cd EMP-Connector

Then, we will use Maven to build the project. For that, run the following:

mvn clean package

Once the Maven command has completed, a jar file will be generated that includes several example classes on subscribing to a channel. You will use the  DevLoginExample class that allows you to pass a custom login URL. The relevant command is shown below.

java -classpath target/emp-connector-0.0.1-SNAPSHOT-phat.jar com.salesforce.emp.connector.example.DevLoginExample    

The <login_URL> variable will be a Salesforce URL in this format - https://MyDomainName.my.salesforce.com

The <username>, <password> and <channel> should be replaced accordingly.

You can now create a new entity or update an existing entity that corresponds to the channel you subscribed to. For example, if you passed in /data/AccountChangeEvent in place of the <channel> variable in the string above, when you create or change records in the Account standard object, you should get a change event notification. A sample of a change event notification with values is shown below.

{   
"schema": "IeRuaY6cbI_HsV8Rv1Mc5g",          
   "payload": {             
     "ChangeEventHeader": {                 
       "entityName": "Account",                  
       "recordIds": [                    
        ""
       ], 
       "changeType": "CREATE",                 
       "changeOrigin": "com.salesforce.core",                   
       "transactionKey": "001b7375-0086-250e-e6ca-b99bc3a8b69f",                   
       "sequenceNumber": 1,                   
       "isTransactionEnd": true,                   
       "commitTimestamp": 1501010206653,                   
       "commitNumber": 92847272780,                   
       "commitUser": ""             
     },              
     "Name": "Acme",              
     "Description": "Worldwide leader in gadgets of the future.",              
     "OwnerId": "",              
     "CreatedDate": "2018-03-11T19:16:44Z",              
     "CreatedById": "",              
     "LastModifiedDate": "2018-03-11T19:16:44Z",              
     "LastModifiedById": ""  
  },   
  "event": {      
    "replayId": 6  
  } 
 }

Salesforce Change Data Capture Implementation Example

Now that we know how CDC works in Salesforce, let's look at a possible scenario where Change Data Capture and Salesforce can prove useful. Imagine Alice works as a field agent of Acme Inc, selling products to end users and Bob works in the marketing department. 

At some point, the data showing the volume and type of sales will be of interest to the marketing department. The marketing department may be using a different Business Intelligence platform to visualize data but the actual records may be stored on Salesforce. It becomes imperative that any data analyzed by Bob in the marketing department as part of their campaign appraisal be up to date and reflect current realities.

In such a situation, Change Data Capture will ensure that once Alice updates sales records on Salesforce, a notification is triggered to the connector supporting the downstream data warehouse that Bob uses as a data source for his analytics. That way, the data across both departments are synchronized and any insights derived will be on live data.

Limitations of the Custom-Code Approach for Salesforce Change Data Capture

Salesforce Change Data Capture is a powerful tool for organizations seeking to better utilize their data. Despite the extensive documentation of Salesforce and various developer resources, effective implementation of Change Data Capture in Salesforce requires intricate knowledge of the Salesforce platform. Also required is in-depth knowledge of the downstream database or data warehouse used as the target. 

Depending on the use case, custom code may need to be written to achieve the desired goal of synchronizing data. If you were to try to roll out your own custom solution using code, you will need to be aware of the nuances associated with Salesforce. These are things such as the value of the schema ID in the change event message changing whenever a new field is added to the entity or when a field type is changed. It is best to use supported connectors provided by Salesforce and other trusted third parties unless you have in-house expertise and your organization is ready to maintain the codebase for the custom solution. One of the best ways to use Change Data Capture with Salesforce is to use a no-code platform that offers access to various Salesforce connectors. Wide support for a variety of connectors means that you can directly migrate data between Salesforce and another external system.

Implement Salesforce CDC with Arcion

When looking for a no-code solution to enable CDC for your Salesforce data, look no further than Arcion. Arcion offers both self-managed on-premise and fully-managed cloud products to fit your exact needs.

By using Arcion, it’s easy for organizations to build pipelines using Salesforce as a data source. Easily move data from Salesforce to an external system, such as a big data platform like Snowflake or Databricks, with zero code required. For this particular use case, Arcion is much easier and more flexible than the built-in CDC supported by Salesforce.

Benefits of using Arcion over Salesforce CDC include:

  • Agentless CDC support for 20+ sources and target databases and data warehouses
  • Multiple deployment types supported across cloud and on-premise installations
  • Configuration can easily be done through UI, with minimal effort and zero code
  • Automatic schema conversion & schema evolution support out-of-the-box (including SQL and NoSQL conversion) 
  • Patent-pending distributed & highly scalable architecture: Arcion is the only end-to-end multi-threaded CDC solution on the market that auto-scales vertically & horizontally. Any process that Arcion runs on Source & Target is parallelized using patent-pending techniques to achieve maximum throughput. 
  • Built-in high availability (HA): Arcion is designed with high availability built-in. It makes the pipeline robust without disruption and data is always available in the target, in real-time.
  • Auto-recovery (patent-pending): Internally, Arcion does a lot of check-pointing. Therefore, any time the process gets killed for any reason (e.g., database, disk, network, server crashes), it resumes from the point where it was left off, instead of restarting from scratch. The entire process is highly optimized with a novel design that makes the recovery extremely fast.  

Conclusion

This article gave you a detailed breakdown of Change Data Capture as implemented in Salesforce. You were first introduced to Salesforce, then to its event-driven architecture for Change Data Capture. The concept of entities and the convention used in accessing channels were highlighted, alongside an explanation of the structure of the change event message. You installed the EMP Connector and used it to subscribe to channels. A scenario for Change Data Capture implementation was also presented and a discussion on the limitations of a custom code approach was explored, with the conclusion that for many users the best outcome is to go with a no-code solution.

To get a no-code CDC solution that works seamlessly with Salesforce, try out Arcion Cloud and have a scalable and reliable CDC-enabled pipeline set up in minutes. If you’re looking for a flexible on-premise solution, try out Arcion Self-Managed. Regardless of the deployment type chosen, Arcion provides unparalleled flexibility and scalability for data pipelines using Salesforce as a data source.

Matt is a developer at heart with a passion for data, software architecture, and writing technical content. In the past, Matt worked at some of the largest finance and insurance companies in Canada before pivoting to working for fast-growing startups.
Luke has two decades of experience working with database technologies and has worked for companies like Oracle, AWS, and MariaDB. He is experienced in C++, Python, and JavaScript. He now works at Arcion as an Enterprise Solutions Architect to help companies simplify their data replication process.
Join our newsletter

Take Arcion for a Spin

Deploy the only cloud-native data replication platform you’ll ever need. Get real-time, high-performance data pipelines today.

Free download

8 sources & 6 targets

Pre-configured enterprise instance

Available in four US AWS regions

Contact us

20+ enterprise source and target connectors

Deploy on-prem or VPC

Satisfy security requirements

Join the waitlist for Arcion Cloud (beta)

Fully managed, in the cloud.