vignettes/civis_scripts.Rmd
civis_scripts.Rmd
Civis Scripts are the way to productionize your code with Civis Platform. You’ve probably used three of the four types of scripts already in the Civis Platform UI (“Code” –> “Scripts”): language (R, Python3, javascript, and sql), container, and custom. If you’ve run any of these scripts in Civis Platform, you’ve already started productionizing your code. Most loosely, productionizing means that your code now runs on a remote server instead of your local or development machine.
You probably already know some of the benefits too:
This guide will cover how to programmatically do the same tasks using the API that you are used to doing in GUI. Instead of typing in values for the parameters or clicking to download outputs, you can do the same thing in your programs. Hooray for automation!
Specifically, this guide will cover how to programmatically read outputs, kick off new script runs, and publish your own script templates to share your code with others. It will make heavy use of API functions directly, but highlight convenient wrappers for common tasks where they have been implemented already.
Ready? Buckle in!
A script is a job that executes code in Civis Platform. A script accepts user input through parameters, gives values back to the user as run outputs, and records any logs along the way.
A script author can share language and container scripts with others by letting users clone the script. But if an author makes a change to the script such as fixing a bug or adding a feature, users will have to re-clone the script to get access to those changes.
A better way to share code with others is with template scripts. A template script is a ‘published’ language or container script. The script that the template runs is the backing script of the template.
Once a container or language script is published as a template, users can create their own instances of the template. These instances are called custom scripts and they inherit all changes made to the template. This feature makes it easy to share code with others and to rapidly deploy changes and fixes.
# create a container script with a parameter script <- scripts_post_containers( required_resources = list(cpu = 1024, memory = 50, diskSpace = 15), docker_command = 'cd /package_dir && Rscript inst/run_script.R', docker_image_name = 'civisanalytics/datascience-r', name = 'SCRIPT NAME', params = list( list(name = 'NAME_OF_ENV_VAR', label = 'Name User Sees', type = 'string', required = TRUE) ) ) # publish the container script as a template template <- templates_post_scripts(script$id, name = 'TEMPLATE NAME', note = 'Markdown Docs') # run a template script, returning file ids of run outputs out <- run_template(template$id) # post a file or JSONValue run output within a script write_job_output('filename.csv') json_values_post(jsonlite::toJSON(my_list), 'my_list.json') # get run output file ids of a script out <- fetch_output_file_ids(civis_script(id)) # get csv run outputs of a script df <- read_civis(civis_script(id), regex = '.csv', using = read.csv) # get JSONValue run outputs my_list <- read_civis(civis_script(id))
Let’s make these concepts concrete with an example! We’ll use the ‘R’ language script throughout, but container
scripts work exactly the same way. In the second section, we’ll cover custom
and template
scripts.
The post
method creates the job and returns a list of metadata about it, including its type.
source <- c(' print("Hello World!") ') job <- scripts_post_r(name = 'Hello!', source = source)
Each script can be uniquely identified by its job id. If you have a job id but don’t know what kind of script it is, you can do jobs_get(id)
.
Each script type is associated with its own API endpoints. For instance, to post a job of each script type, you need scripts_post_r
, scripts_post_containers
, scripts_post_custom
, or templates_post_scripts
.
This job hasn’t been run yet. To kick off a run do:
run <- scripts_post_r_runs(job$id) # check the status scripts_get_r_runs(job$id, run$id) # automatically poll until the job completes await(scripts_get_r_runs, id = job$id, run_id = run$id)
Since kicking off a job and polling until it completes is a really common task for this guide, let’s make it a function:
run_script <- function(source, name = 'Cool') { job <- scripts_post_r(name = name, source = source) run <- scripts_post_r_runs(job$id) await(scripts_get_r_runs, id = job$id, run_id = run$id) }
This script isn’t very useful because it doesn’t produce any output that we can access. To add an output to a job, we can use scripts_post_r_runs_outputs
. The two most common types of run outputs are Files
and JSONValues
.
We can specify adding a File
as a run output by uploading the object to S3 with write_civis_file
and setting object_type
in scripts_post_r_runs_outputs
to File
. Notice that the environment variables CIVIS_JOB_ID
and CIVIS_RUN_ID
are automatically inserted into the environment for us to have access to.
source <- c(" library(civis) data(iris) write.csv(iris, 'iris.csv') job_id <- as.numeric(Sys.getenv('CIVIS_JOB_ID')) run_id <- as.numeric(Sys.getenv('CIVIS_RUN_ID')) file_id <- write_civis_file('iris.csv') scripts_post_r_runs_outputs(job_id, run_id, object_type = 'File', object_id = file_id) ") run <- run_script(source)
Since this pattern is so common, we replaced it with the function write_job_output
which you can use to post a filename as a run output for any script type.
source <- c(" library(civis) data(iris) write.csv(iris, 'iris.csv') write_job_output('iris.csv') ") run <- run_script(source)
It is best practice to make run outputs as portable as possible because the script can be called by any language. For arbitrary data, JSONValues are often the best choice. Regardless, it is user friendly to add the file extension to the name of the run output.
Adding JSONValue run outputs is common enough for it to be implemented directly as a Civis API endpoint, json_values_post
:
source <- c(" library(civis) library(jsonlite) my_farm <- list(cows = 1, ducks = list(mallard = 2, goldeneye = 1)) json_values_post(jsonlite::toJSON(my_farm), name = 'my_farm.json') ") run_farm <- run_script(source)
To retrieve script outputs we can use scripts_list_r_runs_outputs
:
out <- scripts_list_r_runs_outputs(run$rId, run$id) iris <- read_civis(out$objectId, using = read.csv)
Since this pattern is also common, you can simply use read_civis
directly. This will work for any script type. Use regex
and using
to filter run outputs by file extension, and provide the appropriate reading function. JSONValues can be read automatically.
# get csv run outputs iris <- read_civis(civis_script(run$rId), regex = '.csv', using = read.csv) # get JSONValues my_farm <- read_civis(civis_script(run_farm$rId))
Scripts are more useful if their behavior can be configured by the user, which can be done with script parameters. Script parameters are placeholders for input by the user. Specific values of the parameters input by the user are called arguments. Here, we modify run_script
to automatically add a parameter, and simultaneously take a value of that parameter provided by the user. In the script itself, we can access the parameter as an environment variable.
# Add 'params' and 'arguments' to run_script run_script <- function(source, args, name = 'Cool') { params <- list( # params is a list of individual parameters list( name = 'PET_NAME', # name of the environment variable with the user value label = 'Pet Name', # name displaayed to the user type = 'string', # type required = TRUE # required? ) ) job <- scripts_post_r(name = name, source = source, params = params, arguments = args) run <- scripts_post_r_runs(job$id) await(scripts_get_r_runs, id = job$id, run_id = run$id) } # Access the PET_NAME variable source <- c(' library(civis) pet_name <- Sys.getenv("PET_NAME") msg <- paste0("Hello", pet_name, "!") print(msg) ') # Let's run it! Here we pass the argument 'Fitzgerald' to the # parameter 'PET_NAME' that we created. run_script(source, name = 'Pet Greeting', args = list(PET_NAME = 'Fitzgerald'))