The "Select All" option in the Files selection of the S3 data source allows you to ingest all the files that exist in the path you've entered (including files in subfolders).

When the "Select All" option is selected, the data source extracts all the files from the specified path that were modified since the last successful run of the S3 data source. If you rerun the data source, it automatically skips all the files that weren't modified (existing or new files) since the last successful run.
If you need to recollect older files, see the Recollecting files article for the options you can do.
Note that although the files shouldn't load more than once, they still might. For example, if a specific run fails and retries or if the file is actually being modified on your end. This is why it is always recommended to set a primary key for any file system data source such as S3.
Comments
0 comments
Please sign in to leave a comment.