Oracle Table to BigQuery schema

Migrate an Oracle Table structure to Google BigQuery

Posted by Rookpoint on Thursday, June 18, 2020

Overview

We can generate a BigQuery Table schema by utilising the Oracle data dictionary metadata to create the equivalent structure in JSON format.

Schema creation wizard

An easy way to achieve this is to use the BigQuery Schema Generator wizard in the Rookpoint Integration Tools application.

Upload a standard Oracle table definition script and the generator will turn it into a JSON schema with a command that can be executed from the bq command-line to create the BigQuery Table.

Oracle Table definition

The wizard requires a CSV file in a standard format that contains all of the necessary Oracle data dictionary metadata for the Table.

See this article for an example of generating a CSV file in Oracle.

The CSV file structure needs to be in the format shown below. This example shows the file structure for the EMP table.

"COLUMN_ID","COLUMN_NAME","DATA_TYPE","DATA_LENGTH","DATA_PRECISION","DATA_SCALE","NULLABLE"
1,"EMPNO","NUMBER",22,4,0,"N"
2,"ENAME","VARCHAR2",10,,,"Y"
3,"JOB","VARCHAR2",9,,,"Y"
4,"MGR","NUMBER",22,4,0,"Y"
5,"HIREDATE","DATE",7,,,"Y"
6,"SAL","NUMBER",22,7,2,"Y"
7,"COMM","NUMBER",22,7,2,"Y"
8,"DEPTNO","NUMBER",22,2,0,"Y"

In order to generate the file in the appropriate format, the following SQL script can be used. The first two SET lines are specifically for using SQL Developer or SQLcl, as they can produce CSV output directly, however it is also possible to use SQL Plus or your preferred tool.

SET feedback off
SET sqlformat csv

select
col.column_id,
col.column_name,
col.data_type,
col.data_length,
col.data_precision,
col.data_scale,
col.nullable
from user_tables tab,
      user_tab_columns col
where upper(tab.table_name) = upper('TABLE NAME HERE')
  and tab.table_name        = col.table_name
order by col.column_id;

The CSV file output will appear in the Script Output window of SQL Developer.

BigQuery Configuration

This part of the wizard allows you to tailor your output for the type of BigQuery Table that you wish to create. It also allows you to provide the GCP details for your own Project (if you know them). Defaults are put in place if this information is not known. Specify the environment (Windows or Linux) in which you will be running the CLI command.

Output Results

The wizard will generate a JSON schema definition and a BigQuery CLI command. Click the Download button to save the schema to your local machine.

Copy and paste the CLI command into the command line client that you will be using, ensuring that the last part of the command string is pointing to the location in which you have saved your JSON schema file.

Example

You are using the Google Cloud SDK on a windows machine, so you will need to specify Windows as the generation target. For this example we have selected Partitioning with an Ingestion Time by Hour, and specified our GCP target as project1:dataset and a table with a name of tablename.

Ensure that the JSON script that you generated and downloaded (tablename.json) is located in the D:\temp\ directory of your local machine. Now open a command line window, ensure that you are connected and authorised to your Google account and run the command.

  bq mk --table --description "Tablename" --label from_oracle_def --time_partitioning_type=HOUR project1:dataset1.tablename D:\temp\tablename.json