Skip to content

Welcome to Banyan 👋

Why Banyan?

Banyan helps your data team use familiar APIs to process massive datasets with lower cloud costs and a smaller ecological footprint. By making it possible to instantly switch between big data in the cloud and small data samples on local workstations (like laptops), you no longer have to run all your data workloads in cloud data centers.

What is Banyan?

Banyan is a comprehensive suite of libraries for large-scale data processing with automatic instant data sampling. While we are currently working on more extensive benchmarks, our preliminary results show that Banyan's performance is comparable to that of Dask (Coiled) for a common data analytics task.

Is Banyan Easy to Adopt?

Yes! Read on to find out how to get started. If you are unsure, please contact us at, reach us on the Banyan Users Slack, or schedule a meeting.

Where Do We Get Started?

Getting started is a matter of simply setting up AWS and Banyan accounts. Follow the steps here to get started.

After you get started, you can:

  1. add team mebers
  2. create clusters
  3. manage them
  4. start sessions

When starting a session, you may want to:

Then take a look at our documentation to start writing code for processing massive datasets with Banyan:

You can also take a look at some Jupyter Notebooks with example applications of Banyan Julia and of Banyan Python. These notebooks walk you through the whole described above of creating a cluster, starting a session, and running large-scale computation and sampling.