Custom Scripting

Banyan Custom Scripting enables your team to instantly scale custom Julia and Python scripts to the cloud, running in massively parallel computing sessions. There are several advantages of running custom scripts in the cloud:

Running code with massive parallelism
Running code closer to where data is stored (e.g., in Amazon S3)
Running code along with other code that requires parallelism

Custom Code Regions

Julia-only

This feature is currently only supported in Julia. Python support coming soon.

The offloaded function allows a user to offload a function to run on workers in a cloud computing session. To use offloaded, you should first start a session.

Then you may pass in an anonymous function to the offloaded function:

res = offloaded(() -> -1)

offloaded returns the result of the function running on a single worker. Here is another example:

using AWSS3

res2 = offloaded() do
    return read(S3Path("s3://some_cluster_bucket/some_path"))
end

Note how packages that are being used are also available for use in the offloaded code. Offloading reading from S3 can be faster because the offloaded reading happens in your AWS Virtual Private Cloud which is closer to your data in Amazon S3 and the result is quickly transmitted back to your local machine over Amazon SQS.

To pass data into offloaded code, simply pass it into offloaded:

x = 5

res3 = offloaded(x) do inp
    if inp == 1
        return -1
    else 
        return 1
    end
end

res4 = offloaded(inp -> -1, x)

You may also call offloaded with offloaded(my_function, distributed=true) for the function to run on all the workers in the running session with the result being returned from the one worker at index 1.

Custom Code File

With run_session, you can write code in separate files and run the code on all the workers in a computing session.

run_session(
    cluster_name = "Simple Models",
    session_name = "shallow water modeling",
    nworkers = 64,
    code_files=["model.jl"]
)

JuliaPython

run_session(
    cluster_name = "Simple Models",
    session_name = "shallow water modeling",
    nworkers = 64,
    code_files = ["model.jl"]
)

run_session(
    cluster_name="Simple Models",
    session_name="shallow water modeling",
    nworkers=64,
    code_files=["model.py"]
)

run_session will run all the given code_files in the given order. It will wait (block) until the code finishes.

Custom Parallel Computing

Julia-only

This feature is currently only supported in Julia. Python support coming soon.

Banyan provides a few helper functions for easy parallel computing:

get_nworkers() returns the number of workers in the current session
get_worker_idx() returns the index of the current worker code is running in ranging from 1...get_nworkers()
split_across(1:100) evenly splits the given array or iterable object and returns the portion assigned to the current worker
reduce_across(add, piece) accepts a function that takes two values and returns a "reduced" value; It accepts a value from each worker and returns the result of the reduction across all workers to each worker.
sync_across() synchronizes all the workers so that they don't continue running code until they all run this line

Available Technologies

There are several technologies that are readily available to any Custom Scripting user:

MPI/mpi4py
Packages that use MPI/mpi4py
Amazon S3 and Amazon EFS

MPI and HDF5

Julia-only

This feature is currently only supported in Julia. Python support coming soon.

Banyan automatically sets up MPI on every cluster. It installs OpenMPI along with parallel HDF5. When your code files run, MPI has been initialized. So you can proceed to use any MPI.jl function in your custom scripts and code regions.

MPI.jl-Using Packages

Julia-only

This feature is currently only supported in Julia. Python support coming soon.

There are plenty of Julia packages that use MPI. Here are just a few:

Check out our example notebook of using Oceananigans.jl for massively parallel shallow water modeling running in a Banyan session.

Amazon S3 and Amazon EFS

You can also easily use Amazon S3 and Amazon EFS from any custom script!

Every Banyan cluster has an S3 bucket and an EFS file system that it can read from and write to. See the Clusters table on the Banyan dashboard for the S3 bucket name.

To access the S3 bucket, read and write files in the s3/<bucket_name>/ directory. To access the EFS file system, read and write files in the efs/ directory.