BanyanImages.jl

BanyanImages.jl is the easiest way to start processing large image datasets with the powerful and expressive Julia programming language. It aims to make switching between Images.jl and BanyanImages.jl as seamless as possible.

Getting Started

To get started with BanyanImages.jl, follow the steps here to set up Banyan.jl.

Then, open the Julia REPL and press ] to enter "Pkg" (package) mode and run add BanyanImages. (Ensure you have also added Banyan and BanyanArrays first.)

Finally, exit the package mode and start a session.

using Banyan, BanyanArrays, BanyanImages

start_session(
    cluster_name="satelite-image-decoding",
    nworkers=128,
    session_name="Decoding-with-Model-A",
    email_when_ready=true
)

Awesome! You can now use the functions described below for massively parallel data processing in this session of 128 workers.

Reading Images

Functions

Banyan provides both read_png and read_jpg for reading in image files.

You may pass in one of: - A single path - A list of paths - A 2-tuple of (1) an iterable range/generator and (2) function that operates on each iterated element to return a path - A 3-tuple of (1) an object, (2) an iterable, and (3) a function that operates on two arguments where one is the object and the other is each iterated element to return a path

A path may either be a Internet-hosted location (e.g., from a web API) beginning with https:// or http:// or an Amazon S3 location beginning with s3://.

Specify add_channelview=true to add an additional dimension for RGB channels.

Reusing an S3 bucket for different clusters

For reading images, files get downloaded onto the cluster being used at sample collection time. Metadata is cached in the S3 bucket for that cluster. If the same S3 bucket is used for a different cluster, you should be sure to read with metadata_invalid=true to ensure that the data gets read from the Internet to your cluster

Example

An example of passing in a 2-tuple of paths:

files = (  # 100
    IterTools.product(1:10, 1:10),
    (i, j) -> "https://gibs.earthdata.nasa.gov/wmts/epsg4326/best/MODIS_Terra_CorrectedReflectance_TrueColor/default/2012-07-09/250m/6/$i/$j.jpg"
)
data = BanyanImages.read_jpg(files; add_channelview=true)  # Specify `add_channelview` to add a dimension for the RGB channels

See the notebook for a full example on PyTorch-based satelite image decoding with BanyanImages.jl and BanyanONNXRunTime.jl.

Note that you must compute or write a result for computation to happen.

Notes

Please see Using Amazon S3 for instructions on setting up Amazon S3. In order to read images in S3, you must have created your cluster created with access to the S3 bucket that the images you are working with are in.

If you want to read from images in an S3 bucket that your cluster was not created with access to, you may need to destroy your cluster and create a new cluster with access to the desired S3 bucket.

When reading images, a sample must be collected. Find out how to collect a sample faster and how to preserve cached samples after writing.

Processing images

Reading an image or set of images produces a Banyan array where the images are concatenated along the first dimension. You may process the Banyan array using functions from BanyanArrays.jl such as mapslices.