Our data sources create a field named ID
even if there is a defined primary key. Primary keys are a table property used to ensure the uniqueness of the rows in the table.
We use the ID
field when we ingest data into the table. Based on this field, the platform knows which records to update and which to insert. All write operations in Panoply are, in essence, upsert-operations.
The contents of the ID
field changes based on the following:
- If there is a field named
ID
in the original data, we store that value. - If there is no
ID
field and no primary key (not even a default primary key, like in file system sources or some databases), then the data source generates a UUID for each record and stores it in theID
field.- Every time the data source runs, all the records are inserted and duplications might occur.
- If the primary key is set, the value of the primary key (either a single field or a concatenation of fields) is stored in the
ID
field.- In some cases, this value will be hashed, but that depends on the source type and version.
- If there is a field named
ID
in the original data, the platform will overwrite that value. - If needed, we can change the primary key field from
ID
to any other name (we usually usepanoply_pk
as it is unique enough and will not appear in the original data).
Please reach out to Panoply’s support team for more information on setting up or configuring data sources.
Comments
0 comments
Please sign in to leave a comment.