Defining Partitioning Functions

Julia-only

This feature is currently only supported in Julia. Python support coming soon.

Annotated code regions are lazily scheduled and executed by Banyan behind the scenes. The Banyan scheduler groups code regions into stages where each stage is a set of code regions that can share the same partition types for all the data they process. The scheduler also generates code to first split data across workers and loop iterations at the start of the stage and then to merge data back together at the end of the stage.

But how does the schedule know how to split and merge data? The scheduler does this by dynamically dispatching partitioning functions based on the partition types of data.

WIP

Please note that the documentation on Extending Banyan is currently work in progress.

What are Partitioning Functions?

There are three kinds of partitioning functions:

Splitting functions split data across workers and/or loop iterations
Merging functions merge data that was previously split
Casting functions convert data split across workers from one partition type to another

To create a partitioning function, two things are required:

A function that implements the splitting/merging/casting
A description of the requirements of the partition type of any data to be split/merged/casted with this function

Why Write Partitioning Functions?

Partitioning functions are dynamically dispatched based on the partition types of data. This dynamic dispatch allows splitting/merging/casting functions to specialize their implementation for the particular partition type that they require or the location of the data being split/merged.

For example, data with a partition type of Blocked{balanced} might be split with a splitting function SplitBlock that doesn't require iterating through the data being split and is therefore much less computationally intensive than a function like SplitGroup.

Another example data with a location of HDF5 might be split by a splitting function ReadHDF5 that leverages Parallel HDF5 for platform-optimized reading and writing of HDF5 datasets parallelized across workers.