Welcome to Banyan

Why Banyan?

Banyan helps your data team use familiar APIs to process massive datasets with lower cloud costs and a smaller ecological footprint. By making it possible to instantly switch between big data in the cloud and small data samples on local workstations (like laptops), you no longer have to run all your data workloads in cloud data centers.

What is Banyan?

Banyan is a comprehensive suite of libraries for large-scale data processing with automatic instant data sampling. While we are currently working on more extensive benchmarks, our preliminary results show that Banyan's performance is comparable to that of Dask (Coiled) for a common data analytics task.

Is Banyan Easy to Adopt?

Yes! Read on to find out how to get started. If you are unsure, please contact us at support@banyancomputing.com, reach us on the Banyan Users Slack, or schedule a meeting.

Where Do We Get Started?

Getting started is a matter of simply setting up AWS and Banyan accounts. Follow the steps here to get started.

After you get started, you can:

When starting a session, you may want to:

Then take a look at our documentation to start writing code for processing massive datasets with Banyan:

Banyan Julia
- Banyan.jl for custom scripting
- BanyanArrays.jl for arrays
- BanyanDataFrames.jl for data frames
- BanyanImages.jl for images
- BanyanHDF5.jl for HDF5 datasets
- BanyanONNXRunTime.jl for running PyTorch/TensorFlow models
Banyan Python
- banyan-python for custom scripting
- banyan-polars (in development) for data frames

You can also take a look at some Jupyter Notebooks with example applications of Banyan Julia and of Banyan Python. These notebooks walk you through the whole described above of creating a cluster, starting a session, and running large-scale computation and sampling.