Tables can be exported from Panoply to Amazon S3 using Redshift's UNLOAD command.
Using the UNLOAD
command to export data to S3 is very simple. Here's an example statement on how to use it:
UNLOAD ('SELECT * FROM "<schema>"."<table>"')
TO 's3://<bucket_name>/<folder_name>/' CREDENTIALS
'aws_access_key_id=<aws_access_key_id>;aws_secret_access_key=<aws_secret_access_key>'
REGION '<bucket region>'
HEADER -- Adds headers to file
DELIMITER '|' -- Sets Delimiter to |
ADDQUOTES -- Wraps each value in quotes
ESCAPE -- Escapes special characters
PARALLEL OFF -- Exports data to a single file, will split if the file size is larger than 6.2GB
GZIP; -- Compresses the data
In the statement above, replace the following placeholder names with your own values:
schema
- The name of the schema where your source table belongs. This is usually thepublic
schema.table
- The name of the table where you want to get data from.bucket_name
andfolder_name
- The name of the S3 bucket and folder where your table data will be exported to.aws_access_key_id
- Your AWS Access Key.aws_secret_access_key
- Your AWS Access Secret.bucket region
- The code for the AWS region where your bucket resides.
Once you've replaced the placeholder names with your own values, execute the statement using an external workbench tool. We recommend SQL Workbench/J.
Executing the UNLOAD
statement results in a compressed CSV-like file that's also exported to your S3 account. The compressed file contains the records of your source table.
Here's a screenshot of how it looks like when viewed on an Amazon S3 account. The S3 bucket is capsuapp
, and test
is the destination folder. The 000.gz
file is the result of executing the sample UNLOAD
statement.
NOTE: Refrain from executing the same UNLOAD
statement more than once. Unless you provide a different bucket_name
or folder_name
each time you run the command, expect a file overwrite error to appear.
To prevent a file overwrite error from happening, include or add the ALLOWOVERWRITE
option in the UNLOAD
statement.
Exporting all the tables in a Panoply data warehouse
Below are the steps to export all the tables in your Panoply account. You need to execute the different queries provided there. To run them, we recommend using an external workbench tool like SQL Workbench/J instead of the Workbench tool in Panoply. This is because the Panoply Workbench has a default timeout of two minutes, and executing UNLOAD
statements might take more than that, especially if the query will return large amounts of data.
-
Start by getting a list of all the tables in your Panoply data warehouse (DWH). Use the query below and execute it in SQL Workbench/J.
SELECT schemaname::varchar "schema"
, tablename::varchar "table"
, '"' + schemaname::varchar + '"."' + tablename::varchar + '"' "full_obj_name"
FROM pg_tables
WHERE tableowner != 'rdsdb';As defined in the query, three columns are to be returned:
schema
,table
, andfull_obj_name
. Take note of these columns as we'll get back to them in the subsequent steps. - Next, export the query results to a CSV file and save it locally to your computer.
- Then, import the exported CSV file to Excel or Google Sheets. Once imported, your spreadsheet must look similar to this:
The column headers named
schema
,table
, andfull_obj_name
must be in cellsA1
,B1
, andC1
, respectively -
Now, copy and paste the function below in the cell
D2
of your spreadsheet. Replace<bucket name>
with your own S3 bucket, and<aws_access_key_id>
and<aws_secret_access_key>
with your S3 credentials.=CONCATENATE("UNLOAD ('SELECT * FROM ", C2, "') TO 's3://<bucket_name>/", A2, "/", B2, "' CREDENTIALS 'aws_access_key_id=<aws_access_key_id>;aws_secret_access_key=<aws_secret_access_key>' HEADER DELIMITER '|' ADDQUOTES ESCAPE PARALLEL OFF GZIP;")
The function generates a valid
UNLOAD
statement that's based on the template we've shown in the previous step. It makes use of the values of cellsA2
(schema),B2
(table), andC2
(full_obj_name), to define the S3 bucket destination, and what source table to be exported.
Of course, you are free to customize theUNLOAD
statement inside the function to fit your needs. - Copy the function to other cells in column D by dragging it down using the Fill cursor of Excel or Google Sheets. This will generate the
UNLOAD
statement for each table in your list. -
Once done, execute each statement in SQL Workbench/J. You can also try running the queries in groups, 10-20 at a time to monitor the process, or, one by one, with some kind of automation tool.
Comments
0 comments
Please sign in to leave a comment.