If you are already familiar with the way Panoply ingests nested data into tables and subtables, you are probably looking for a way to identify your parent-child tables relationship.
Here is a query you can use to show the relationships between parent and child tables and the corresponding keys you can join the child table to the parent table.
SELECT child_table,
c_id."column" AS child_key,
parent_table,
p_id."column" AS parent_key,
source_id
FROM (SELECT c.relname AS child_table,
SPLIT_PART(REPLACE(regexp_substr (d.description,'"sourceId":"[^"]*"'),'"',''),':',2) AS source_id,
SPLIT_PART(REPLACE(regexp_substr (d.description,'"parent":"[^"]*"'),'"',''),'public-',2) AS parent_table
FROM pg_description d
JOIN pg_class c ON c.oid = d.objoid) AS table_relationships
JOIN pg_table_def c_id
ON table_relationships.child_table = c_id.tablename
AND c_id."column" ILIKE '%' ||parent_table|| '\\_id'
JOIN pg_table_def p_id
ON table_relationships.parent_table = p_id.tablename
AND p_id."column" = 'id'
WHERE parent_table <> ''
ORDER BY child_table;
This is what the query result looks like:
child_table |
child_key |
parent_table |
parent_key |
source_id |
google-analytics_source_parsed |
google-analytics_id |
google-analytics |
id |
5e0f9d1480c73300081ce9c1 |
salesforce_account_attributes |
salesforce_account_id |
salesforce_account |
id |
5d55bf1ba420240008576a5b |
salesforce_account_billingaddress |
salesforce_account_id |
salesforce_account |
id |
5d55bf1ba420240008576a5b |
salesforce_account_shippingaddress |
salesforce_account_id |
salesforce_account |
id |
5d55bf1ba420240008576a5b |
salesforce_opportunity_attributes |
salesforce_opportunity_id |
salesforce_opportunity |
id |
5d55bf1ba420240008576a5b |
You can use the result shown above to write a query that joins the data from a parent table to its child tables. Refer to the parent_key
and child_key
columns to know the respective columns where the parent and child table must match together.
For example, to join the google-analytics
parent table with its google-analytics_source_parsed
child table, you would write a query similar to the one below. Note that the TOP <number>
clause is used to limit the number of rows returned.
SELECT TOP 10 *
FROM "google-analytics" ga
JOIN "google-analytics_source_parsed" gasp
ON ga.id = gasp."google-analytics_id";
Comments
0 comments
Please sign in to leave a comment.