Connect Oracle to Snowflake: 2 Easy Methods

Luke Smith
Enterprise Solutions Architect
December 20, 2022
Matt Tanner
Developer Relations Lead
December 20, 2022
Matt Tanner
Developer Relations Lead
December 20, 2022
Get a migration case study in your inbox
Join our newsletter

Moving data from Oracle to Snowflake is an extremely common scenario with a few different solutions. Some solutions support real-time data synchronization while others are batch-based, which creates a lag in the data reaching the target platform. In this article, we will look at two of the easiest ways to connect Oracle and Snowflake. The first is by leveraging Arcion and the second is by creating a manual ETL process.

Table of Contents

Prerequisites

The following are the prerequisites needed to understand this write-up and to follow along with examples connecting Oracle to Snowflake:

  • Knowledge of Databases and Data Warehouses
  • Knowledge of Structure Query Language (SQL)
  • Understanding the type of data to be transferred
  • An Oracle account
  • A Snowflake account

What is Oracle Database?

Oracle is a popular Relational Database Management System (RDBMS) produced and marketed by Oracle Corporation. It is one of the most used RDBMS to store, organize, and retrieve information. Oracle was the first database to use Enterprise Grid Computing (EGC) to perform tasks. EGC helps in the retrieval of information from any location through a group of computers connected to a network. This makes Oracle a flexible and cost-effective way to manage information and applications. 

Oracle runs on multiple types of servers across different operating systems such as Windows, UNIX, Linux, and macOS. Oracle is a multi-model database system where each database is a collection of data treated as a unit to store and retrieve related information. Oracle offers in-memory, NoSQL, and MySQL databases that are available on-premise through Oracle Cloud or via Oracle Cloud.

Oracle has five editions of its database to choose from, they include Standard Edition One, Standard Edition, Enterprise Edition, Express Edition (XE), and Personal Edition. For further information on Oracle, visit here.

What is Snowflake?

Snowflake is a cloud-based Software-as-a-Service (SaaS) data warehouse and analytics platform which enables customers to analyze data. Its cloud data warehouse is built on Amazon Web Service (AWS), Microsoft Azure, and Google infrastructure, providing a platform for storing and retrieving data. It is powered by the standard ANSI SQL protocol that supports full and semi-structured data such as JSON, Parquet, XML, and many other formats. 

With Snowflake, there is no hardware or software to install, configure, or manage since the platform is fully-managed by Snowflake itself. This means organizations do not need dedicated resources for setup, maintenance, and support of the platform, as you would for in-house and physical operations. This is because everything is carried out on the cloud with a near-zero administration.

Snowflake’s unique architecture separates its storage unit from its compute unit, thereby scaling differently so customers can use and pay for both independently. It is very flexible and highly scalable to accommodate massive amounts of data and a lot of users using its computing power. For further information on Snowflake, visit the official website here.

Oracle To Snowflake Replication Advantages

Choosing the right analytics tool to use for your data warehousing needs can greatly enhance the overall performance of your business. Although there is a massive upside to using analytics tools, choosing the right one can be challenging at times. Replicating data from Oracle to Snowflake  for data analytics offers a lot of advantages over the traditional Oracle environment. Snowflake gives you a superior edge by leveraging a purpose-built big data platform that specifically excels at deriving insights through analytics. A few of the benefits of replicating data from Oracle to Snowflake are listed below:

  • Minimized Administration: Snowflake is a completely managed service. You can simply load, query, and share data easily without needing to upgrade, partition, or create indexes like traditional databases with minimal engineering resources.
  • Semi-Structured Data: Snowflake allows you to work with semi-structured data by simply pulling the semi-structured data into the data pool with ease. This can be difficult using Oracle.
  • The Flexibility And Scalability Of Loading Data: Snowflake is highly scalable and reliable. It allows the loading of large datasets from various sources such as web applications, on-premise databases and files, and cloud storage. Its elasticity gives you the liberty to scale only when needed and pay for only what you use.
  • High Computational Power: Snowflake is easy to use and has a massive amount of available computational power. This gives room for many concurrent users to run complex queries simultaneously and extremely fast query execution.

Best Ways To Set Up Oracle To Snowflake Integration

How do you export data from an Oracle database in a format that Snowflake can digest and subsequently, import the data into Snowflake? This may sound like a simple process but not knowing how to go about it can become a daunting task. Let’s dig into two methods for achieving this and how to avoid common pitfalls along the way.

Method 1: Using Arcion To Set Up Oracle To Snowflake Integration

Using Arcion allows users to use an easy-to-use UI to set up a replication pipeline using CDC. In a few steps, you’ll have Oracle and Snowflake connected to Arcion. Once connected, a snapshot of the Oracle database will be loaded into Snowflake and any subsequent changes will also be synced in real-time.

The first step is to log into Arcion and set up a new replication by clicking the New button on the Replications screen.

The next step is to choose a Replication mode and a Write mode. Once selected, click the Next button.

> For more details on the different Replication and Write modes, check out our docs.

On the  Source screen, select Oracle as the source. If no Oracle Source is defined, click Create New and follow the instructions to create one. Once selected, click Continue to Destination.

On the Destination screen, select Snowflake as the target destination. If no Snowflake Destination is defined, click New Connection and follow the instructions to create one. Once selected, click Continue to Filter.

Next, on the Filter screen, we will choose what schemas, tables, and columns we want to replicate from Oracle to Snowflake. Choose how you would like to filter the data by checking off the applicable checkboxes on the left and clicking Start Replication.

From here, the replication will start. This next screen will allow you to monitor the process and inspect any errors that occur.

Because we selected Snapshot Mode as the replication type, once the data has been moved from Oracle to Snowflake, Arcion will shut down the process. If you would like to consistently replicate all changes through CDC, you can choose Full or Real-time mode. Once our Snapshot job completes, we will see a summary.

With that, you have connected Oracle to Snowflake using Arcion and migrated your data. 

Method 2: Manual ETL Process To Set Up Oracle To Snowflake Integration

There is no direct link for loading data between Oracle and Snowflake. Therefore, a mediator that acts as a connector is used to join the two platforms and implement data replication between them. There are several ways to export data from an Oracle database, but you have to consider exporting it into a format that would not require further transformations before importing it to Snowflake. There are many possible formats like CSV, XML, and JSON. In this example, we will focus on CSV since it is the simplest way.

The steps described below are the steps and processes involved in moving data from Oracle to Snowflake using a CSV file.

Step 1: Extract Data From Oracle To CSV Using SQL*Plus

SQL*Plus is a query tool used to query and redirect the result of an SQL query to a CSV file. It is installed with every Oracle Database Server or Client installation. The command that will be used to direct the SQL query result to a CSV file is called Spool. The Spool file is not visible until the command is turned off. If the Spool file does not exist, a new file is created for it. If it does exist, it will be overwritten. An example of writing a Spool file is shown below.

-- Turn on the spool    
 spool spool_file.txt 
-- Run your Query    
 select  * from dba_table; 
-- Turn of spooling    
 spool off; 

Since the Spool functionality is used to write out files, FEEDBACK will have to be turned off so that the output of the row numbers will not be appended to the output file. FILE will also need to be set to CSV format. Below is a sample script to extract data from an Oracle table.

#!/usr/bin/bash 

FILE="students.csv"  

sqlplus -s user_name/password@oracle_db <<EOF

SET PAGESIZE 35000

SET COLSEP "|" 

SET LINESIZE 230 

SET FEEDBACK OFF  

SPOOL $FILE 

SELECT * FROM EMP;  

SPOOL OFF 

EXIT 

EOF#!/usr/bin/bash  

FILE="emp.csv"  

sqlplus -s scott/tiger@XE <<EOF

SET PAGESIZE 50000

SET COLSEP "," 

SET LINESIZE 200 

SET FEEDBACK OFF  

SPOOL $FILE  

SELECT * FROM STUDENTS;  

SPOOL OFF 

EXIT 

EOF

To break down some of the above script options, let’s take a look at the following definitions:

  • SET PAGESIZE:  This is used to set the number of lines per page. The header line is displayed on every page. 
  • SET COLSEP:  This is used in setting the column separator.
  • SET LINESIZE:  This indicates the number of characters per line. This can be set to a value where the entire record comes within a single line and the default is 80.
  • SET FEEDBACK OFF: This is put off to prevent logs from appearing in the CSV file.
  • SPOOL $FILE: This showcases the filename in which the results of the query will be written.
  • SELECT * FROM STUDENTS: This is the query to be executed to extract data from the table.
  • SPOOL OFF: This is used to stop writing the contents of the SQL session to the file. 

Incremental Data Extract

This option is used to extract data after records are modified from the last data pull. It aids in the incremental extraction of data from the database. Once Spool is turned on, any SQL can be run and the results written to a specific file, hence, this is a useful way of extracting only modified records by generating SQL with set conditions. Below is a sample script to select and pull data incrementally.

select * from students where last_modified_time > 
last_pull_time and last_modified_time <= sys_time.

Step 2: Data Type Conversion And Formatting

When moving data from Oracle to Snowflake, it is important to convert and format certain data types to ensure a smooth data load. Oracle and Snowflake do not handle data in the same manner and a lot of attention needs to be given to such data types when creating the data pipeline. 

Below are a few of the differences that appear between Oracle and Snowflake which need to be handled correctly:

  • Character Sets: Errors can easily occur when there is a character set mismatch between Oracle and Snowflake. Therefore, the source and target location need to have similar character sets. Snowflake supports all the major character sets including UTF-8 and UTF-16.
  • Primitive and Advanced Data Types: While both Oracle and Snowflake cover most primitive and advanced data types, which include nested data structures such as struct and array, they do it differently as shown in the table below.
  • SQL Constraints: When moving data across various big data platforms, data integrity might be compromised when the system lacks support for SQL constraints. Snowflake offers support for all SQL constraints such as PRIMARY KEY, FOREIGN KEY, NOT NULL, and UNIQUE to deliver a smooth data transition.
  • Date and Time Format: If a custom date and time setting is used in a particular table, it has to be specified when transferring the data using the File Format Option. Snowflake is very flexible with date and time options and can be easily imported. 

Step 3: Stage Files To S3

Before data is loaded from Oracle to Snowflake, it has to be uploaded first to a staging area. The process of loading the data to a location that Snowflake has access to is called staging. For example, if your Snowflake instance is running on AWS then the data can be uploaded to an S3 location. The staging area can either be internal or external. Let’s look at the differences between the two approaches.

Internal Stage: For the internal stage location, each user and table is automatically assigned an internal stage which is then used to stage data that is related to the user or table. An internal staging area can be created by giving it a specified name. If the internal stage is created by a user using SQL statements with a name, lots of data-loading options can be assigned to the stage. These options include file format, date format, and others. These are updated automatically when data is loaded to this stage. For a user, the internal stage will be named @~ and the default internal table will have the same name as the source table.

To create an internal staging area, the PUT command is used to import files into Snowflake. The PUT command has many options which include:

  • Where to stage files, such as the internal table staging area of Snowflake
  • The number of parallelisms
  • Automatic compression
  • And many others

It’s important to note that PUT does not support uploading files to external stages such as Amazon S3, Google Cloud Storage, or Microsoft Azure. 

Before using the PUT, it is assumed that you have created a table in Snowflake, if not, first create a named internal stage and give it some default values as shown below.

create or replace stage internal_stage_name 

copy_options= (on_error='skip_file') 

file_format= (type = 'CSV' field_delimiter = ',' skip_header = 1);

Then proceed to use the PUT command to stage files to the internal Snowflake stage with the reference documentation of the PUT command as seen below.

PUT file://path_to_your_file/your_filename internal_stage_name

External Stage: To upload data to the external staging area, Snowflake supports Amazon S3, Google Cloud Storage, or Microsoft Azure as an external staging location. This is done by simply pointing to the location where data is stored and it will be loaded directly to the Snowflake table through the staging area. The data will not be required to move to an internal stage first. 

Creating and using an external staging area that points to an S3 location will require IAM credentials with access permissions. Additionally, if the data is to be decrypted, proper keys will also need to be provided to have a hitch-free transfer. 

When data is extracted from the Oracle database, it can then be loaded to S3 using the direct upload option or custom code that uses an AWS SDK in any supported programming language. Once the data is loaded to S3, an external stage is then created to point to the location. An example of how to create an external stage is shown below.

create or replace stage oracle_ext_stage url='s3://snowflake_oracle/data/load/files/'

credentials=(aws_key_id='1d318jnsonmb5#dgd4rrb3c' aws_secret_key='aii998nnrcd4kx5y6z'); 

encryption=(master_key = 'eSxX0jzskjl22bNaaaDuOaO8=');

Step 4: Copy Staged Files To The Snowflake Table

The next step after extracting data from Oracle is uploading it to an S3 location and creating an external Snowflake stage to copy the data to the table. COPY INTO is the command that is used to do this. Below are sample scenarios where COPY INTO can be used.

  • To load from a named internal stage.
copy into oracle_table 

from @oracle_stage;
  • To load a particular file from the external stage.
copy into my_ext_stage_table 

from @oracle_ext_stage/tutorials/dataloading/items_ext.csv;
  • To copy directly from an external location without creating a staging area.
copy into oracle_table 

from s3://mybucket/oracle_snow/data/files 

credentials=(aws_key_id='$AWS_ACCESS_KEY_ID' aws_secret_key='$AWS_SECRET_ACCESS_KEY') 

encryption=(master_key = 'eSxX009jhh76jkIuLPH5r4BD09wOaO8=') 

file_format = (format_name = csv_format);
  • To specify files using patterns
copy into oracle_pattern_table 

from @oracle_stage 

file_format = (type = 'TSV') 

pattern='.*/.*/.*[.]csv[.]gz';

Commonly used options for CSV file loading when using the COPY command include:

  • DATE_FORMAT: This is used to specify the custom date format used in the file so that Snowflake can parse it properly.
  • TIME_FORMAT: This is used to specify any custom time format used in the file.
  • COMPRESSION: This is used to specify the algorithms used to compress your data.
  • RECORD_DELIMITER: This is used to indicate line separator characters.
  • FIELD_DELIMITER: This is used to indicate the character separating fields in the file.
  • SKIP_HEADER: This is used to show the number of header lines to be skipped while inserting data into the table.

Other useful options for CSV file loading can be found here.

Update Snowflake Table

As stated earlier, data can be extracted incrementally from the Oracle database. This results in new and updated records each time it is extracted and loaded into Snowflake. Snowflake also supports row-level data manipulations. This makes it convenient to handle data changes incrementally by extracting data into an intermediate or temporary table. Then the records can be modified in the final table to reflect changes made in the intermediate table.

There are three ways in which this can be done. The methods to do this include updating the rows in the target table with new data, deleting rows from the target table which are in the landing table, and finally, merging statements. Let’s dive into each of these options below.

Updating the rows in the target table with new data: This method involves the updating of rows in the target table with new data having the same keys. Then, the new records are inserted from the intermediate or landing table which are not in the final table. An example of this update and insert is shown below in the syntax.

UPDATE oracle_target_table t  SET t.value = s.value  FROM  landing_delta_table in  WHERE t.id = in.id; 

INSERT INTO oracle_target_table (id, value) 
SELECT id, value FROM  landing_delta_table WHERE NOT id IN (SELECT id FROM oracle_target_table);

Deleting rows from the target table which are in the landing table: This method involves deleting rows from the target table which are also in the landing table and then inserting all rows from the landing table to the final table. This allows the final table to have the current data without any duplicates. An example of deleting rows is shown below.

DELETE .oracle_target_table f  WHERE f.id IN (SELECT id from landing_table);  INSERT oracle_target_table (id, value)  
SELECT id, value FROM  landing_table;

Merging statement: Merge statements that combine insert and updates can be used to effect changes in the landing table to the target table. This can be done in one statement as illustrated in the example below.

MERGE into oracle_target_table t1 using landing_delta_table t2 on t1.id = t2.id 
WHEN matched then update set value = t2.value 
WHEN not matched then 
INSERT (id, value) values (t2.id, t2.value); 

Limitations of Manual ETL Process

The following are the limitations associated with the manual ETL processes described above:

  • High Cost: This method is not cost-effective as it is not friendly for small-budget companies. The cost of hiring an ETL developer to create a data pipeline between Oracle and Snowflake can become expensive to build and maintain. This can be a major detractor for organizations looking to use this method.
  • Maintenance: The ETL codes will need to be updated regularly to ensure the maintenance and smooth running of the data processing system. This maintenance takes up bandwidth, may cause latency in operations, and lead to unpredictable costs.
  • Scalability: It is pertinent that ETL systems can scale to higher volumes to avoid failures. The manual ETL code approach can become a problem as you may not be able to build a pipeline to oversee such a large data increase. Scaling and optimizing the pipeline to handle increasing data volumes may be somewhat limited depending on how it was built.

Conclusion

This article has shown you a few options for setting up an Oracle to Snowflake integration. We explored two methods for implementing this connectivity including the manual method using SQL*Plus and an automated method using Arcion

The manual method of setting up an ETL pipeline can be very confusing. Not only it requires knowledge of using SQL*Plus but, also needs to stage files to Amazon S3 before eventually copying them into Snowflake. This multi-stage process can be tough to implement and maintain without the proper resources in place. It can be costly and tough to scale.

All of this can be easily avoided by using a simple yet reliable method of connecting the two platforms automatically through Arcion. Arcion can be used to set up a smooth and efficient data pipeline to handle all your ETL and CDC processes without having to worry about challenges associated with the extraction and loading of data. With multiple deployment options, including Arcion Cloud and Arcion Self-Hosted, the platform is the most simple and scalable way to connect Oracle to Snowflake. To get started today, download Arcion Self-hosted or sign up for Arcion Cloud for free (no payment required), and deploy your first pipeline in a matter of minutes.

Matt is a developer at heart with a passion for data, software architecture, and writing technical content. In the past, Matt worked at some of the largest finance and insurance companies in Canada before pivoting to working for fast-growing startups.
Matt is a developer at heart with a passion for data, software architecture, and writing technical content. In the past, Matt worked at some of the largest finance and insurance companies in Canada before pivoting to working for fast-growing startups.
Luke has two decades of experience working with database technologies and has worked for companies like Oracle, AWS, and MariaDB. He is experienced in C++, Python, and JavaScript. He now works at Arcion as an Enterprise Solutions Architect to help companies simplify their data replication process.
Back to Blog
Connect Oracle to Snowflake: 2 Easy Methods

Connect Oracle to Snowflake: 2 Easy Methods

Matt Tanner
Developer Relations Lead
December 20, 2022

Prerequisites

The following are the prerequisites needed to understand this write-up and to follow along with examples connecting Oracle to Snowflake:

  • Knowledge of Databases and Data Warehouses
  • Knowledge of Structure Query Language (SQL)
  • Understanding the type of data to be transferred
  • An Oracle account
  • A Snowflake account

What is Oracle Database?

Oracle is a popular Relational Database Management System (RDBMS) produced and marketed by Oracle Corporation. It is one of the most used RDBMS to store, organize, and retrieve information. Oracle was the first database to use Enterprise Grid Computing (EGC) to perform tasks. EGC helps in the retrieval of information from any location through a group of computers connected to a network. This makes Oracle a flexible and cost-effective way to manage information and applications. 

Oracle runs on multiple types of servers across different operating systems such as Windows, UNIX, Linux, and macOS. Oracle is a multi-model database system where each database is a collection of data treated as a unit to store and retrieve related information. Oracle offers in-memory, NoSQL, and MySQL databases that are available on-premise through Oracle Cloud or via Oracle Cloud.

Oracle has five editions of its database to choose from, they include Standard Edition One, Standard Edition, Enterprise Edition, Express Edition (XE), and Personal Edition. For further information on Oracle, visit here.

What is Snowflake?

Snowflake is a cloud-based Software-as-a-Service (SaaS) data warehouse and analytics platform which enables customers to analyze data. Its cloud data warehouse is built on Amazon Web Service (AWS), Microsoft Azure, and Google infrastructure, providing a platform for storing and retrieving data. It is powered by the standard ANSI SQL protocol that supports full and semi-structured data such as JSON, Parquet, XML, and many other formats. 

With Snowflake, there is no hardware or software to install, configure, or manage since the platform is fully-managed by Snowflake itself. This means organizations do not need dedicated resources for setup, maintenance, and support of the platform, as you would for in-house and physical operations. This is because everything is carried out on the cloud with a near-zero administration.

Snowflake’s unique architecture separates its storage unit from its compute unit, thereby scaling differently so customers can use and pay for both independently. It is very flexible and highly scalable to accommodate massive amounts of data and a lot of users using its computing power. For further information on Snowflake, visit the official website here.

Oracle To Snowflake Replication Advantages

Choosing the right analytics tool to use for your data warehousing needs can greatly enhance the overall performance of your business. Although there is a massive upside to using analytics tools, choosing the right one can be challenging at times. Replicating data from Oracle to Snowflake  for data analytics offers a lot of advantages over the traditional Oracle environment. Snowflake gives you a superior edge by leveraging a purpose-built big data platform that specifically excels at deriving insights through analytics. A few of the benefits of replicating data from Oracle to Snowflake are listed below:

  • Minimized Administration: Snowflake is a completely managed service. You can simply load, query, and share data easily without needing to upgrade, partition, or create indexes like traditional databases with minimal engineering resources.
  • Semi-Structured Data: Snowflake allows you to work with semi-structured data by simply pulling the semi-structured data into the data pool with ease. This can be difficult using Oracle.
  • The Flexibility And Scalability Of Loading Data: Snowflake is highly scalable and reliable. It allows the loading of large datasets from various sources such as web applications, on-premise databases and files, and cloud storage. Its elasticity gives you the liberty to scale only when needed and pay for only what you use.
  • High Computational Power: Snowflake is easy to use and has a massive amount of available computational power. This gives room for many concurrent users to run complex queries simultaneously and extremely fast query execution.

Best Ways To Set Up Oracle To Snowflake Integration

How do you export data from an Oracle database in a format that Snowflake can digest and subsequently, import the data into Snowflake? This may sound like a simple process but not knowing how to go about it can become a daunting task. Let’s dig into two methods for achieving this and how to avoid common pitfalls along the way.

Method 1: Using Arcion To Set Up Oracle To Snowflake Integration

Using Arcion allows users to use an easy-to-use UI to set up a replication pipeline using CDC. In a few steps, you’ll have Oracle and Snowflake connected to Arcion. Once connected, a snapshot of the Oracle database will be loaded into Snowflake and any subsequent changes will also be synced in real-time.

The first step is to log into Arcion and set up a new replication by clicking the New button on the Replications screen.

The next step is to choose a Replication mode and a Write mode. Once selected, click the Next button.

> For more details on the different Replication and Write modes, check out our docs.

On the  Source screen, select Oracle as the source. If no Oracle Source is defined, click Create New and follow the instructions to create one. Once selected, click Continue to Destination.

On the Destination screen, select Snowflake as the target destination. If no Snowflake Destination is defined, click New Connection and follow the instructions to create one. Once selected, click Continue to Filter.

Next, on the Filter screen, we will choose what schemas, tables, and columns we want to replicate from Oracle to Snowflake. Choose how you would like to filter the data by checking off the applicable checkboxes on the left and clicking Start Replication.

From here, the replication will start. This next screen will allow you to monitor the process and inspect any errors that occur.

Because we selected Snapshot Mode as the replication type, once the data has been moved from Oracle to Snowflake, Arcion will shut down the process. If you would like to consistently replicate all changes through CDC, you can choose Full or Real-time mode. Once our Snapshot job completes, we will see a summary.

With that, you have connected Oracle to Snowflake using Arcion and migrated your data. 

Method 2: Manual ETL Process To Set Up Oracle To Snowflake Integration

There is no direct link for loading data between Oracle and Snowflake. Therefore, a mediator that acts as a connector is used to join the two platforms and implement data replication between them. There are several ways to export data from an Oracle database, but you have to consider exporting it into a format that would not require further transformations before importing it to Snowflake. There are many possible formats like CSV, XML, and JSON. In this example, we will focus on CSV since it is the simplest way.

The steps described below are the steps and processes involved in moving data from Oracle to Snowflake using a CSV file.

Step 1: Extract Data From Oracle To CSV Using SQL*Plus

SQL*Plus is a query tool used to query and redirect the result of an SQL query to a CSV file. It is installed with every Oracle Database Server or Client installation. The command that will be used to direct the SQL query result to a CSV file is called Spool. The Spool file is not visible until the command is turned off. If the Spool file does not exist, a new file is created for it. If it does exist, it will be overwritten. An example of writing a Spool file is shown below.

-- Turn on the spool    
 spool spool_file.txt 
-- Run your Query    
 select  * from dba_table; 
-- Turn of spooling    
 spool off; 

Since the Spool functionality is used to write out files, FEEDBACK will have to be turned off so that the output of the row numbers will not be appended to the output file. FILE will also need to be set to CSV format. Below is a sample script to extract data from an Oracle table.

#!/usr/bin/bash 

FILE="students.csv"  

sqlplus -s user_name/password@oracle_db <<EOF

SET PAGESIZE 35000

SET COLSEP "|" 

SET LINESIZE 230 

SET FEEDBACK OFF  

SPOOL $FILE 

SELECT * FROM EMP;  

SPOOL OFF 

EXIT 

EOF#!/usr/bin/bash  

FILE="emp.csv"  

sqlplus -s scott/tiger@XE <<EOF

SET PAGESIZE 50000

SET COLSEP "," 

SET LINESIZE 200 

SET FEEDBACK OFF  

SPOOL $FILE  

SELECT * FROM STUDENTS;  

SPOOL OFF 

EXIT 

EOF

To break down some of the above script options, let’s take a look at the following definitions:

  • SET PAGESIZE:  This is used to set the number of lines per page. The header line is displayed on every page. 
  • SET COLSEP:  This is used in setting the column separator.
  • SET LINESIZE:  This indicates the number of characters per line. This can be set to a value where the entire record comes within a single line and the default is 80.
  • SET FEEDBACK OFF: This is put off to prevent logs from appearing in the CSV file.
  • SPOOL $FILE: This showcases the filename in which the results of the query will be written.
  • SELECT * FROM STUDENTS: This is the query to be executed to extract data from the table.
  • SPOOL OFF: This is used to stop writing the contents of the SQL session to the file. 

Incremental Data Extract

This option is used to extract data after records are modified from the last data pull. It aids in the incremental extraction of data from the database. Once Spool is turned on, any SQL can be run and the results written to a specific file, hence, this is a useful way of extracting only modified records by generating SQL with set conditions. Below is a sample script to select and pull data incrementally.

select * from students where last_modified_time > 
last_pull_time and last_modified_time <= sys_time.

Step 2: Data Type Conversion And Formatting

When moving data from Oracle to Snowflake, it is important to convert and format certain data types to ensure a smooth data load. Oracle and Snowflake do not handle data in the same manner and a lot of attention needs to be given to such data types when creating the data pipeline. 

Below are a few of the differences that appear between Oracle and Snowflake which need to be handled correctly:

  • Character Sets: Errors can easily occur when there is a character set mismatch between Oracle and Snowflake. Therefore, the source and target location need to have similar character sets. Snowflake supports all the major character sets including UTF-8 and UTF-16.
  • Primitive and Advanced Data Types: While both Oracle and Snowflake cover most primitive and advanced data types, which include nested data structures such as struct and array, they do it differently as shown in the table below.
  • SQL Constraints: When moving data across various big data platforms, data integrity might be compromised when the system lacks support for SQL constraints. Snowflake offers support for all SQL constraints such as PRIMARY KEY, FOREIGN KEY, NOT NULL, and UNIQUE to deliver a smooth data transition.
  • Date and Time Format: If a custom date and time setting is used in a particular table, it has to be specified when transferring the data using the File Format Option. Snowflake is very flexible with date and time options and can be easily imported. 

Step 3: Stage Files To S3

Before data is loaded from Oracle to Snowflake, it has to be uploaded first to a staging area. The process of loading the data to a location that Snowflake has access to is called staging. For example, if your Snowflake instance is running on AWS then the data can be uploaded to an S3 location. The staging area can either be internal or external. Let’s look at the differences between the two approaches.

Internal Stage: For the internal stage location, each user and table is automatically assigned an internal stage which is then used to stage data that is related to the user or table. An internal staging area can be created by giving it a specified name. If the internal stage is created by a user using SQL statements with a name, lots of data-loading options can be assigned to the stage. These options include file format, date format, and others. These are updated automatically when data is loaded to this stage. For a user, the internal stage will be named @~ and the default internal table will have the same name as the source table.

To create an internal staging area, the PUT command is used to import files into Snowflake. The PUT command has many options which include:

  • Where to stage files, such as the internal table staging area of Snowflake
  • The number of parallelisms
  • Automatic compression
  • And many others

It’s important to note that PUT does not support uploading files to external stages such as Amazon S3, Google Cloud Storage, or Microsoft Azure. 

Before using the PUT, it is assumed that you have created a table in Snowflake, if not, first create a named internal stage and give it some default values as shown below.

create or replace stage internal_stage_name 

copy_options= (on_error='skip_file') 

file_format= (type = 'CSV' field_delimiter = ',' skip_header = 1);

Then proceed to use the PUT command to stage files to the internal Snowflake stage with the reference documentation of the PUT command as seen below.

PUT file://path_to_your_file/your_filename internal_stage_name

External Stage: To upload data to the external staging area, Snowflake supports Amazon S3, Google Cloud Storage, or Microsoft Azure as an external staging location. This is done by simply pointing to the location where data is stored and it will be loaded directly to the Snowflake table through the staging area. The data will not be required to move to an internal stage first. 

Creating and using an external staging area that points to an S3 location will require IAM credentials with access permissions. Additionally, if the data is to be decrypted, proper keys will also need to be provided to have a hitch-free transfer. 

When data is extracted from the Oracle database, it can then be loaded to S3 using the direct upload option or custom code that uses an AWS SDK in any supported programming language. Once the data is loaded to S3, an external stage is then created to point to the location. An example of how to create an external stage is shown below.

create or replace stage oracle_ext_stage url='s3://snowflake_oracle/data/load/files/'

credentials=(aws_key_id='1d318jnsonmb5#dgd4rrb3c' aws_secret_key='aii998nnrcd4kx5y6z'); 

encryption=(master_key = 'eSxX0jzskjl22bNaaaDuOaO8=');

Step 4: Copy Staged Files To The Snowflake Table

The next step after extracting data from Oracle is uploading it to an S3 location and creating an external Snowflake stage to copy the data to the table. COPY INTO is the command that is used to do this. Below are sample scenarios where COPY INTO can be used.

  • To load from a named internal stage.
copy into oracle_table 

from @oracle_stage;
  • To load a particular file from the external stage.
copy into my_ext_stage_table 

from @oracle_ext_stage/tutorials/dataloading/items_ext.csv;
  • To copy directly from an external location without creating a staging area.
copy into oracle_table 

from s3://mybucket/oracle_snow/data/files 

credentials=(aws_key_id='$AWS_ACCESS_KEY_ID' aws_secret_key='$AWS_SECRET_ACCESS_KEY') 

encryption=(master_key = 'eSxX009jhh76jkIuLPH5r4BD09wOaO8=') 

file_format = (format_name = csv_format);
  • To specify files using patterns
copy into oracle_pattern_table 

from @oracle_stage 

file_format = (type = 'TSV') 

pattern='.*/.*/.*[.]csv[.]gz';

Commonly used options for CSV file loading when using the COPY command include:

  • DATE_FORMAT: This is used to specify the custom date format used in the file so that Snowflake can parse it properly.
  • TIME_FORMAT: This is used to specify any custom time format used in the file.
  • COMPRESSION: This is used to specify the algorithms used to compress your data.
  • RECORD_DELIMITER: This is used to indicate line separator characters.
  • FIELD_DELIMITER: This is used to indicate the character separating fields in the file.
  • SKIP_HEADER: This is used to show the number of header lines to be skipped while inserting data into the table.

Other useful options for CSV file loading can be found here.

Update Snowflake Table

As stated earlier, data can be extracted incrementally from the Oracle database. This results in new and updated records each time it is extracted and loaded into Snowflake. Snowflake also supports row-level data manipulations. This makes it convenient to handle data changes incrementally by extracting data into an intermediate or temporary table. Then the records can be modified in the final table to reflect changes made in the intermediate table.

There are three ways in which this can be done. The methods to do this include updating the rows in the target table with new data, deleting rows from the target table which are in the landing table, and finally, merging statements. Let’s dive into each of these options below.

Updating the rows in the target table with new data: This method involves the updating of rows in the target table with new data having the same keys. Then, the new records are inserted from the intermediate or landing table which are not in the final table. An example of this update and insert is shown below in the syntax.

UPDATE oracle_target_table t  SET t.value = s.value  FROM  landing_delta_table in  WHERE t.id = in.id; 

INSERT INTO oracle_target_table (id, value) 
SELECT id, value FROM  landing_delta_table WHERE NOT id IN (SELECT id FROM oracle_target_table);

Deleting rows from the target table which are in the landing table: This method involves deleting rows from the target table which are also in the landing table and then inserting all rows from the landing table to the final table. This allows the final table to have the current data without any duplicates. An example of deleting rows is shown below.

DELETE .oracle_target_table f  WHERE f.id IN (SELECT id from landing_table);  INSERT oracle_target_table (id, value)  
SELECT id, value FROM  landing_table;

Merging statement: Merge statements that combine insert and updates can be used to effect changes in the landing table to the target table. This can be done in one statement as illustrated in the example below.

MERGE into oracle_target_table t1 using landing_delta_table t2 on t1.id = t2.id 
WHEN matched then update set value = t2.value 
WHEN not matched then 
INSERT (id, value) values (t2.id, t2.value); 

Limitations of Manual ETL Process

The following are the limitations associated with the manual ETL processes described above:

  • High Cost: This method is not cost-effective as it is not friendly for small-budget companies. The cost of hiring an ETL developer to create a data pipeline between Oracle and Snowflake can become expensive to build and maintain. This can be a major detractor for organizations looking to use this method.
  • Maintenance: The ETL codes will need to be updated regularly to ensure the maintenance and smooth running of the data processing system. This maintenance takes up bandwidth, may cause latency in operations, and lead to unpredictable costs.
  • Scalability: It is pertinent that ETL systems can scale to higher volumes to avoid failures. The manual ETL code approach can become a problem as you may not be able to build a pipeline to oversee such a large data increase. Scaling and optimizing the pipeline to handle increasing data volumes may be somewhat limited depending on how it was built.

Conclusion

This article has shown you a few options for setting up an Oracle to Snowflake integration. We explored two methods for implementing this connectivity including the manual method using SQL*Plus and an automated method using Arcion

The manual method of setting up an ETL pipeline can be very confusing. Not only it requires knowledge of using SQL*Plus but, also needs to stage files to Amazon S3 before eventually copying them into Snowflake. This multi-stage process can be tough to implement and maintain without the proper resources in place. It can be costly and tough to scale.

All of this can be easily avoided by using a simple yet reliable method of connecting the two platforms automatically through Arcion. Arcion can be used to set up a smooth and efficient data pipeline to handle all your ETL and CDC processes without having to worry about challenges associated with the extraction and loading of data. With multiple deployment options, including Arcion Cloud and Arcion Self-Hosted, the platform is the most simple and scalable way to connect Oracle to Snowflake. To get started today, download Arcion Self-hosted or sign up for Arcion Cloud for free (no payment required), and deploy your first pipeline in a matter of minutes.

Matt Tanner
Developer Relations Lead

Take Arcion for a Spin

Deploy the only cloud-native data replication platform you’ll ever need. Get real-time, high-performance data pipelines today.

Get started for free

8 sources & 6 targets

Pre-configured enterprise instance

Available in four US AWS regions

Free download

20+ enterprise source and target connectors

Deploy on-prem or VPC

Satisfy security requirements

Join the waitlist for Arcion Cloud (beta)

Fully managed, in the cloud.

Start your 30-day free trial with Arcion self-hosted edition

Self managed, wherever you want it.

Please use a valid email so we can send you the trial license.