Duplicate data exists if a data source does not have a default primary key (PK). This is true for some types of data sources, such as databases or file uploads. Panoply relies on primary keys, whether they are default (from the data source) or user-configured (on the data source configuration page), to determine what to do with the data imported from your data source.
For an in-depth explanation, please read Panoply's documentation on why duplicate data exists.
Once you've correctly specified or configured a primary key for your data source, Panoply rarely creates duplicate entries because of its upsert mechanism.
Lastly, if you suspect that one of your data source's tables contain duplicate records, go through these steps:
- Make sure that the table is populated by a Panoply data source and not by an external one.
- Make sure that a primary key is set for the source. Note that there are multiple configuration options for establishing a primary key for a data source in Panoply:
- If the source data (database, file-based sources) contains an
id
field, Panoply automatically detects it and uses it as the PK. - For API data sources, Panoply usually sets a default PK provided by the API (it will not necessarily be visible in the advanced options of the data source).
- On the data source's configuration page, and under the Advanced section, provide a value for the Primary Key field. It can be a single field or a composite of multiple ones. In any case, each field in the PK needs to be wrapped in curly brackets.
For example:- PK with one field:
{field1}
- PK with two fields:
{field1}-{field2}
- PK with one field:
- If the source data (database, file-based sources) contains an
Use this query to determine if your table contains duplicate records:
SELECT id, COUNT(*)
FROM table_name
GROUP BY id
HAVING COUNT(*) > 1;- Contact support@panoply.io with the relevant duplicated ID and the table name for further assistance.
Comments
0 comments
Please sign in to leave a comment.