BanyanImages.jl
BanyanImages.jl is the easiest way to start processing large image datasets with the powerful and expressive Julia programming language. It aims to make switching between Images.jl and BanyanImages.jl as seamless as possible.
Getting Started
To get started with BanyanImages.jl, follow the steps here to set up Banyan.jl.
Then, open the Julia REPL and press ]
to enter "Pkg" (package) mode
and run add BanyanImages
. (Ensure you have also added Banyan
and BanyanArrays
first.)
Finally, exit the package mode and start a session.
using Banyan, BanyanArrays, BanyanImages
start_session(
cluster_name="satelite-image-decoding",
nworkers=128,
session_name="Decoding-with-Model-A",
email_when_ready=true
)
Awesome! You can now use the functions described below for massively parallel data processing in this session of 128 workers.
Reading Images
Functions
Banyan provides both read_png
and read_jpg
for reading in image files.
You may pass in one of: - A single path - A list of paths - A 2-tuple of (1) an iterable range/generator and (2) function that operates on each iterated element to return a path - A 3-tuple of (1) an object, (2) an iterable, and (3) a function that operates on two arguments where one is the object and the other is each iterated element to return a path
A path may either be a Internet-hosted location (e.g., from a web API) beginning with https://
or http://
or an Amazon S3
location beginning with s3://
.
Specify add_channelview=true
to add an additional dimension for RGB channels.
Reusing an S3 bucket for different clusters
For reading images, files get downloaded onto the cluster being used at sample collection time. Metadata is cached in the S3 bucket for that cluster. If the same S3 bucket is used for a different cluster, you should be sure to read with metadata_invalid=true
to ensure that the data gets read from the Internet to your cluster
Example
An example of passing in a 2-tuple of paths:
files = ( # 100
IterTools.product(1:10, 1:10),
(i, j) -> "https://gibs.earthdata.nasa.gov/wmts/epsg4326/best/MODIS_Terra_CorrectedReflectance_TrueColor/default/2012-07-09/250m/6/$i/$j.jpg"
)
data = BanyanImages.read_jpg(files; add_channelview=true) # Specify `add_channelview` to add a dimension for the RGB channels
See the notebook for a full example on PyTorch-based satelite image decoding with BanyanImages.jl and BanyanONNXRunTime.jl.
Note that you must compute
or write a result for computation to happen.
Notes
Please see Using Amazon S3 for instructions on setting up Amazon S3. In order to read images in S3, you must have created your cluster created with access to the S3 bucket that the images you are working with are in.
If you want to read from images in an S3 bucket that your cluster was not created with access to, you may need to destroy your cluster and create a new cluster with access to the desired S3 bucket.
When reading images, a sample must be collected. Find out how to collect a sample faster and how to preserve cached samples after writing.
Processing images
Reading an image or set of images produces a Banyan array where the images are concatenated
along the first dimension. You may process the Banyan array using functions from BanyanArrays.jl
such as mapslices
.