Release Notes

June 27, 2022

It's been nearly 4 months since our last release. What have we been up to?

We've charted a vision that establishes Banyan as the eco-friendly large-scale data science platform.
Improved Banyan's ability to automatically maintain instantly available data samples.
Achieved comparable performance to Dask (Coiled) for a common data analytics task.
Launched Banyan Python Custom Scripting

If this sounds interesting to you, please contact us at support@banyancomputing.com, reach us on the Banyan Users Slack, or schedule a meeting.

February 28, 2022

BanyanArrays.jl has several new features:

Ranges for collecting Julia ranges into Banyan Julia arrays (e.g., BanyanArrays.collect(1:10:10_000_000))
getindex for indexing Banyan arrays (e.g., array[:, :, [2, 6], 1:10])
compute_inplace for computing some value without returning it to the client side (e.g., compute_inplace(results))
BanyanHDF5.jl for parallel HDF5 (MPI-parallel HDF5 is only compiled if used) reading and writing

We specifically designed this for a satellite imagery use-case where we want to write each slice of an array to a location in the s3/ directory. This can now be done with a call to map to convert a 4D array into a Vector of 3D image slices, a range to enumerate all the images, another map to write each image to s3/, and compute_inplace to compute the result of map without returning it to the client side.

We also fixed several bugs across Banyan.jl, BanyanArrays.jl, and BanyanDataFrames.jl. Finally, we have continued work on performance comparison of Banyan against other large-scale cloud computing packages and will be sharing this soon.

February 9, 2022

Happy February! We're excited to announce the launch of Banyan Custom Scripting, BanyanImages.jl, and BanyanONNXRunTime.jl. With Banyan's growing suite of annotated software libraries, we now enable entirely new use-cases including large-scale image processing, ML inference, and simulations -- all using the modern Julia language.

Custom Scripting

Banyan.jl v0.3.0 introduces custom scripting. It has never been easier to run custom scripts in a modern programming language like Julia with instant parallelism and easy access to cloud storage.

BanyanImages.jl and BanyanONNXRunTime.jl

We are so excited to announce the v0.1.0 initial releases of BanyanImages.jl and BanyanONNXRunTime.jl! BanyanImages.jl lets you use the familiar Images.jl Julia image processing library for processing massive datasets of images from the Internet or in the cloud.

BanyanONNXRunTime.jl enables you to combine the power of PyTorch and TensorFlow with the expressiveness of Julia. Specifically, you can run PyTorch or TensorFlow models on Banyan Julia arrays loaded from HDF5 or from image datasets.

Stability Improvements

In our January 23 release, we introduced "sessions" and more accurate estimation of memory usage. In this release we significantly increased the stability of these features and also stress tested on larger datasets. These improvements in stability can be found in BanyanArrays.jl v0.2.0 and BanyanDataFrames.jl v0.2.0.

January 23, 2022

Banyan.jl v0.2.4 and BanyanDataFrames.jl v0.1.5

In this release we focused on faster startup time, better scalability, and supporting parameter tunning use-cases with BanyanArrays.map.

Quicker initial startup with sessions (10s - 30 min)
"Session ready" notifications via email
compute for computing futures and returning the value
Less precompilation overhead
Better scalability
Bug fixes

BanyanArrays.jl v0.1.7

Support For Any Data Type

BanyanArrays.jl now supports arbitrary data types and separately defined structs and functions. You may, for example, define structs and functions in separate included Julia files and then specify the paths to the files in code_files when you start_session.

Better Support for Parameter Tuning

With two new features, Banyan Julia becomes the perfect option for massively parallel parameter tuning use-cases (e.g., hyperparameter optimization, genetic algorithms, etc.) where a modern and expressive programming language like Julia is extremely valuable:

force_parallelism when running map
convert(Banyan.Array, A::AbstractArray) and - convert(Banyan.Array, df::AbstractDataFrame)

December 16, 2021

Announcing support for Banyan across North America, Latin America, and Europe! Specifically, we now support general usage of Banyan and launching clusters and jobs in AWS data centers in:

us-east-1 (N. Virginia) and us-east-2 (Ohio)
us-west-1 (N. California) and us-west-2 (Oregon)
eu-central-1 (Frankfurt) and eu-west-2 (London)
sa-east-1 (São Paulo)

November 29, 2021

In this release, we fixed a couple of minor issues with:

Collecting samples of missing data
Larger-than-memory datasets getting written in batches
Type instability in many partitioning functions
Creating clusters in regions other than US West 2
Specifying the region of a cluster with create_cluster

November 3, 2021

We are excited to unveil BanyanDataFrames.jl - the easiest way to unelash Julia on massive tabular datasets for efficient processing and analytics. Some highlights of this release are:

BanyanDataFrames.jl v0.1.0
- Groupby-aggregations with the familiar DataFrames.jl split-apply-combine API
- Reading and writing CSV, Parquet, and Arrow datasets
- Example notebooks
BanyanArrays.jl v0.1.3 and Banyan.jl v0.2.0
- Major bug fixes and increased stability
- Automatic remote installation and usage of packages being used locally
- The ability to wait for a cluster or job to become ready
- Simpler integration with S3
- Windows support

August 5, 2021

Banyan.jl v0.1.2

For this release, we fixed several issues with the way Banyan orchestrates clusters, jobs, and users. We also added more documentation.

BanyanArrays.jl v0.1.2

We improved documentation and began extensive testing of BanyanDataFrames.jl. So far, we have successfully tested distributed group-by aggregation of CSV data stored in Amazon S3 using familiar DataFrames.jl functions.

July 24, 2021

Banyan.jl v0.1.1

This release fixes several bugs that were identified during the develoment of BanyanArrays.jl.

BanyanArrays.jl v0.1.1

We are thrilled to announce the initial release of BanyanArrays.jl, a scalable version of the Julia standard library's arrays that has been annotated using Banyan.jl. This release includes support for (1) map-reduce computation and (2) reading from and writing to HDF5 datasets either from the Internet (e.g., hosted on GitHub) or in Amazon S3.

June 21, 2021

Banyan.jl v0.1.0

We are excited to announce the release of Banyan.jl, a new framework for annotating and offloading Julia code to run automatically parallelized on Banyan-managed clusters. Banyan.jl includes functionality for (1) annotation of Julia code, (2) offloading code to jobs, (3) tracking data dependencies with futures, (4) tracking data locations, and (5) managing clusters.