Query for 600,000 records takes 30 minutes to run!?
Hello Panoply pals, I'm really struggling with a query performance issue and I'm hoping someone can give me some additional things to try. I have a table with 600,000 transactions and a view to calculate the latest status for each person in the table. The view returns 1 record per person and takes about 90 seconds to run. When I join the table to the view (so my users can select current_status from a filter and see the transaction history for folks with that status), it takes 30 minutes to run! What can I do to speed things up?
Technical notes:
- I'm using person id for the join which is configured as the DISTKEY in the transaction table
- probably not applicable for this question, but the transaction date is configured as SORTKEY
- the current status view returns 4 columns
- the view that joins the transactions table to the current view returns 30 columns
-
Official comment
There are a few reasons that can cause a query to run for so long (or not completing at all):
- The JOINs conditions that are being used multiply the number of records in an unsensible factor
- The amount of data that is being returned overload the memory of the receiving client (workbench or BI tool)
In this specific case, it was the latter that caused the issue. We need to keep in mind that most tools are not built to handle and display thousands and thousands of records and the best option in most cases is to add an additional transformation on the query level to aggregate the results into the desired end (or near-end) result you are looking to have.
Please sign in to leave a comment.
Comments
1 comment