start_session()
Start a session on a cluster.
start_session(cluster_name=nothing, nworkers=16, release_resources_after=20, print_logs=false, store_logs_in_s3=true, store_logs_on_cluster=false, sample_rate=nworkers, session_name=nothing, files=nothing, code_files=nothing, force_update_files=false, pf_dispatch_table=nothing, using_modules=[], url=nothing, branch=nothing, directory=nothing dev_paths=nothing, force_clone=false, force_pull=false, force_install=false, nowait=false, ...)
Description
This function starts a session on a particular cluster. Sessios are allocated
with a fixed number of workers. If the number of workers requested for a session
are not running but can be allocated, the first computation after starting the
session may take significantly longer, due to compute nodes spinning up. Computation
can be run on this session until either an explicit end_session
command is sent or
a failure occurs. The session will be in the running
state until the session is ended or fails.
Stdout and stderr output will get written to a log file on the
cluster that gets deleted after the session completes. This file can be optionally
written to the cluster's S3 bucket, persisted on the cluster, and/or returned
and printed out on the client console.
Optional Parameters
cluster_name
*(String
)
Name of cluster to run the session on. Cluster must be in the running
status. If no
cluster is specified, the session runs on any available cluster.
nworkers
(Int
)
Number of workers to allocate for this session. The session cannot be started if the
requested number of workers are not available. In this case, you should either
end current runnning session or create a cluster with more compute resources
(i.e., increase max_num_nodes
or specify EC2 instance type with more compute).
print_logs
(Bool
)
Indicates whether the session log file should get returned to and printed out on the client console. The logs are printed as each evaluation occurs. Defaults to false.
Note that this may be slow if the size of the log file generated is large.
store_logs_in_s3
(Bool
)
Indicates whether to write the session log file to the cluster's S3 bucket. If this is set to true, the log file will be written to S3 only after the session is ended. Defaults to true.
store_logs_on_cluster
(Bool
)
Indicates whether to persist the log file on disk on the cluster. Defaults to false.
sample_rate
(Int
)
Sampling rate to use for collecting samples of data. These samples are used internally by Banyan to estimate various properties of data such as its size or skew. Estimating various properties of data is critical for a good partitioning that is unlikely to run out of memory.
The default sample rate is set to the number of workers for the session so that a sample is approximately the size of data that would be on a single worker. You should not need to set this to anything other than the default, but if you have a small number of workers, you may want to increase the sample rate to avoid running out of memory on the client side because of a sample being too large.
session_name
(String
)
Name of the session that can be used to find session logs on the Banyan Dashboard.
files
(List
)
List of paths to files to be copied onto the cluster. These files can be on the
Internet, in an S3 bucket, or on local disk. Each path must be either
prefixed by http://
, https://
, s3://
, or file://
.
code_files
(List
)
List of paths to code files to be copied onto the cluster and included in the
evaluation for this session. These files can be on the
Internet, in an S3 bucket, or on local disk. Each path must be either
prefixed by http://
, https://
, s3://
, or file://
.
force_update_files
(List
)
By default, files are only uploaded to the cluster if they do not already exist. If a change has been made to this file, set this parameter to true to force updating the file on the cluster. Force updating will force new underlying resources to be used for this sessio
pf_dispatch_table
(Vector{String}
)
Path to the PF (partitioning function) dispatch tables to use for this session.
url
(String
)
URL to a public Github repository that contains the project environment to be
used for this session. This repository is cloned onto the cluster and the computation
for this session will be run using this environment. directory
should be used
to specify the path to the project environment within the repository.
branch
(String
)
Branch of the Github repository specified by url
to use. Should be used
with url
.
directory
(String
)
to the PF dispatch table
Path within the Github repository specified by url
to the directory which
contains the project environment. Path should contain the repository name, such
as repo-name/PATH_TO_DIR_CONTAINING_PROJECT_TOML
. Should be used with url
.
dev_paths
(List
)
List of paths within the repository to mark for as development. Will use the dev version of these packages instead of the public version.
force_clone
(Bool
)*
Indicates whether the resposity should be recloned. Note that this will likely cause any other running session using the same Github repository and branch to fail.
force_pull
(Bool
)
Indicates whether git pull
should be run on the branch the repository is
currently checkout out to.
force_install
(Bool
)
Indicates whether the General registry for Julia should be removed on the cluster. This is useful when the registry is corrupted or needs to be updated. Note that this will likely cause any other running sessions to fail.
nowait
(Bool
)
Indicates not to wait for job creation to complete before returning. Defaults
to false
and will block until job creation is complete.
email_when_ready
(Bool
)
If true
, the user will receive an email notification (sent to the email address chosen
when signing up for Banyan) when the session is ready with workers running (a session
may take 10 s - 30 min to start running). Defaults to false
.
Returns (String
)
Session ID used to reference the session.
Example Usage
start_session(cluster_name="weekly-data-playground")
start_session(
cluster_name="weekly-data-playground",
nworkers=64,
session_name="MNIST-Playground",
email_when_ready=true
)