Split a Redshift table into multiple CSV parts on S3 and return their locations as file IDs and presigned S3 urls.
civis_to_multifile_csv( sql, database, job_name = NULL, hidden = TRUE, include_header = TRUE, compression = "gzip", delimiter = ",", unquoted = FALSE, prefix = NULL )
sql | string, The SQL select string to be executed. |
---|---|
database | string, Name of database where query is run. |
job_name | string, optional. Name to identify scripted sql job. |
hidden | logical, Whether to hide the query in platform. |
include_header | logical, optional. Whether to include headers as an element in the returned list. |
compression | string, optional, Type of compression to use, if any. One of 'none', 'zip', or 'gzip'. |
delimiter | string, optional. Which delimiter to use. One of ',', '\t' or '|'. |
unquoted | logical, optional. Whether or not to quote fields. |
prefix | string, optional. A user specified filename prefix for the output files to have. |
A list with the items:
header: column headers if 'include_header' is TRUE
query: the executed query
entries: a list containing presigned urls for each csv part
compression: the type of compression on each csv part
delimiter: the delimiter used to separate fields
unquoted: whether fields are quoted
When tables are large, unloading by splitting them first is faster. When we split a table, each Redshift compute node can dump its data into S3 in parallel with the others. By doing so, we avoid having all compute nodes sending the data through the leader node, which is slow.
This function returns a list that contains the location of the CSV parts as file IDs and presigned S3 urls. The user can use either the file IDs or the presigned S3 urls to download the CSV parts. The content of the list returned by this function is similar to that of the manifest file returned by Amazon S3 UNLOAD statements.
if (FALSE) { # Download a table into multiple csv parts sql <- "SELECT * FROM schema.table" database <- "important_database" manifest <- civis_to_multifile_csv(sql=sql, database=database) files <- lapply(manifest[["entries"]], function(x) { download_civis(x$id, x$name) x$name }) }