Custom Scripting
Banyan Custom Scripting enables your team to instantly scale custom Julia and Python scripts to the cloud, running in massively parallel computing sessions. There are several advantages of running custom scripts in the cloud:
- Running code with massive parallelism
- Running code closer to where data is stored (e.g., in Amazon S3)
- Running code along with other code that requires parallelism
Custom Code Regions
Julia-only
This feature is currently only supported in Julia. Python support coming soon.
The offloaded
function allows a user to offload a function to run on workers
in a cloud computing session. To use offloaded
, you should first start a session.
Then you may pass in an anonymous function to the offloaded
function:
res = offloaded(() -> -1)
offloaded
returns the result of the function running on a single worker. Here is another example:
using AWSS3
res2 = offloaded() do
return read(S3Path("s3://some_cluster_bucket/some_path"))
end
Note how packages that are being used are also available for use in the offloaded code. Offloading reading from S3 can be faster because the offloaded reading happens in your AWS Virtual Private Cloud which is closer to your data in Amazon S3 and the result is quickly transmitted back to your local machine over Amazon SQS.
To pass data into offloaded code, simply pass it into offloaded
:
x = 5
res3 = offloaded(x) do inp
if inp == 1
return -1
else
return 1
end
end
res4 = offloaded(inp -> -1, x)
You may also call offloaded
with offloaded(my_function, distributed=true)
for the function to run
on all the workers in the running session with the result being returned from the one worker at index 1.
Custom Code File
With run_session
, you can write code in separate files and run the code on all the workers in a computing session.
run_session(
cluster_name = "Simple Models",
session_name = "shallow water modeling",
nworkers = 64,
code_files=["model.jl"]
)
run_session(
cluster_name = "Simple Models",
session_name = "shallow water modeling",
nworkers = 64,
code_files = ["model.jl"]
)
run_session(
cluster_name="Simple Models",
session_name="shallow water modeling",
nworkers=64,
code_files=["model.py"]
)
run_session
will run all the given code_files
in the given order. It will wait (block) until the code finishes.
Custom Parallel Computing
Julia-only
This feature is currently only supported in Julia. Python support coming soon.
Banyan provides a few helper functions for easy parallel computing:
get_nworkers()
returns the number of workers in the current sessionget_worker_idx()
returns the index of the current worker code is running in ranging from1...get_nworkers()
split_across(1:100)
evenly splits the given array or iterable object and returns the portion assigned to the current workerreduce_across(add, piece)
accepts a function that takes two values and returns a "reduced" value; It accepts a value from each worker and returns the result of the reduction across all workers to each worker.sync_across()
synchronizes all the workers so that they don't continue running code until they all run this line
Available Technologies
There are several technologies that are readily available to any Custom Scripting user:
- MPI/mpi4py
- Packages that use MPI/mpi4py
- Amazon S3 and Amazon EFS
MPI and HDF5
Julia-only
This feature is currently only supported in Julia. Python support coming soon.
Banyan automatically sets up MPI on every cluster. It installs OpenMPI along with parallel HDF5. When your code files run, MPI has been initialized. So you can proceed to use any MPI.jl function in your custom scripts and code regions.
MPI.jl-Using Packages
Julia-only
This feature is currently only supported in Julia. Python support coming soon.
There are plenty of Julia packages that use MPI. Here are just a few:
Check out our example notebook of using Oceananigans.jl for massively parallel shallow water modeling running in a Banyan session.
Amazon S3 and Amazon EFS
You can also easily use Amazon S3 and Amazon EFS from any custom script!
Every Banyan cluster has an S3 bucket and an EFS file system that it can read from and write to. See the Clusters table on the Banyan dashboard for the S3 bucket name.
To access the S3 bucket, read and write files in the s3/<bucket_name>/
directory. To access the EFS file system,
read and write files in the efs/
directory.