Tables can be exported from Panoply to Amazon S3 using Redshift's UNLOAD command.
UNLOAD command to export data to S3 is very simple. Here's an example statement on how to use it:
UNLOAD ('SELECT * FROM "<schema>"."<table>"')
TO 's3://<bucket_name>/<folder_name>/' CREDENTIALS
REGION '<bucket region>'
HEADER -- Adds headers to file
DELIMITER '|' -- Sets Delimiter to |
ADDQUOTES -- Wraps each value in quotes
ESCAPE -- Escapes special characters
PARALLEL OFF -- Exports data to a single file, will split if the file size is larger than 6.2GB
GZIP; -- Compresses the data
In the statement above, replace the following placeholder names with your own values:
schema- The name of the schema where your source table belongs. This is usually the
table- The name of the table where you want to get data from.
folder_name- The name of the S3 bucket and folder where your table data will be exported to.
aws_access_key_id- Your AWS Access Key.
aws_secret_access_key- Your AWS Access Secret.
bucket region- The code for the AWS region where your bucket resides.
Once you've replaced the placeholder names with your own values, execute the statement using an external workbench tool. We recommend SQL Workbench/J.
UNLOAD statement results in a compressed CSV-like file that's also exported to your S3 account. The compressed file contains the records of your source table.
Here's a screenshot of how it looks like when viewed on an Amazon S3 account. The S3 bucket is
test is the destination folder. The
000.gz file is the result of executing the sample
NOTE: Refrain from executing the same
UNLOAD statement more than once. Unless you provide a different
folder_name each time you run the command, expect a file overwrite error to appear.
To prevent a file overwrite error from happening, include or add the
ALLOWOVERWRITE option in the
Exporting all the tables in a Panoply data warehouse
Below are the steps to export all the tables in your Panoply account. You need to execute the different queries provided there. To run them, we recommend using an external workbench tool like SQL Workbench/J instead of the Workbench tool in Panoply. This is because the Panoply Workbench has a default timeout of two minutes, and executing
UNLOAD statements might take more than that, especially if the query will return large amounts of data.
Start by getting a list of all the tables in your Panoply data warehouse (DWH). Use the query below and execute it in SQL Workbench/J.
SELECT schemaname::varchar "schema"
, tablename::varchar "table"
, '"' + schemaname::varchar + '"."' + tablename::varchar + '"' "full_obj_name"
WHERE tableowner != 'rdsdb';
As defined in the query, three columns are to be returned:
full_obj_name. Take note of these columns as we'll get back to them in the subsequent steps.
- Next, export the query results to a CSV file and save it locally to your computer.
- Then, import the exported CSV file to Excel or Google Sheets. Once imported, your spreadsheet must look similar to this:
The column headers named
full_obj_namemust be in cells
Now, copy and paste the function below in the cell
D2of your spreadsheet. Replace
<bucket name>with your own S3 bucket, and
<aws_secret_access_key>with your S3 credentials.
=CONCATENATE("UNLOAD ('SELECT * FROM ", C2, "') TO 's3://<bucket_name>/", A2, "/", B2, "' CREDENTIALS 'aws_access_key_id=<aws_access_key_id>;aws_secret_access_key=<aws_secret_access_key>' HEADER DELIMITER '|' ADDQUOTES ESCAPE PARALLEL OFF GZIP;")
The function generates a valid
UNLOADstatement that's based on the template we've shown in the previous step. It makes use of the values of cells
C2(full_obj_name), to define the S3 bucket destination, and what source table to be exported.
Of course, you are free to customize the
UNLOADstatement inside the function to fit your needs.
- Copy the function to other cells in column D by dragging it down using the Fill cursor of Excel or Google Sheets. This will generate the
UNLOADstatement for each table in your list.
Once done, execute each statement in SQL Workbench/J. You can also try running the queries in groups, 10-20 at a time to monitor the process, or, one by one, with some kind of automation tool.