Release Notes
June 27, 2022
It's been nearly 4 months since our last release. What have we been up to?
- We've charted a vision that establishes Banyan as the eco-friendly large-scale data science platform.
- Improved Banyan's ability to automatically maintain instantly available data samples.
- Achieved comparable performance to Dask (Coiled) for a common data analytics task.
- Launched Banyan Python Custom Scripting
If this sounds interesting to you, please contact us at support@banyancomputing.com, reach us on the Banyan Users Slack, or schedule a meeting.
February 28, 2022
BanyanArrays.jl has several new features:
- Ranges for collecting Julia ranges into Banyan Julia arrays (e.g.,
BanyanArrays.collect(1:10:10_000_000)
) getindex
for indexing Banyan arrays (e.g.,array[:, :, [2, 6], 1:10]
)compute_inplace
for computing some value without returning it to the client side (e.g.,compute_inplace(results)
)- BanyanHDF5.jl for parallel HDF5 (MPI-parallel HDF5 is only compiled if used) reading and writing
We specifically designed this for a satellite imagery use-case where we want to write
each slice of an array to a location in the s3/
directory. This can now be done with a call
to map
to convert a 4D array into a Vector
of 3D image slices, a range to enumerate all the
images, another map
to write each image to s3/
, and compute_inplace
to compute the
result of map
without returning it to the client side.
We also fixed several bugs across Banyan.jl, BanyanArrays.jl, and BanyanDataFrames.jl. Finally, we have continued work on performance comparison of Banyan against other large-scale cloud computing packages and will be sharing this soon.
February 9, 2022
Happy February! We're excited to announce the launch of Banyan Custom Scripting, BanyanImages.jl, and BanyanONNXRunTime.jl. With Banyan's growing suite of annotated software libraries, we now enable entirely new use-cases including large-scale image processing, ML inference, and simulations -- all using the modern Julia language.
Custom Scripting
Banyan.jl v0.3.0 introduces custom scripting. It has never been easier to run custom scripts in a modern programming language like Julia with instant parallelism and easy access to cloud storage.
BanyanImages.jl and BanyanONNXRunTime.jl
We are so excited to announce the v0.1.0 initial releases of BanyanImages.jl and BanyanONNXRunTime.jl! BanyanImages.jl lets you use the familiar Images.jl Julia image processing library for processing massive datasets of images from the Internet or in the cloud.
BanyanONNXRunTime.jl enables you to combine the power of PyTorch and TensorFlow with the expressiveness of Julia. Specifically, you can run PyTorch or TensorFlow models on Banyan Julia arrays loaded from HDF5 or from image datasets.
Stability Improvements
In our January 23 release, we introduced "sessions" and more accurate estimation of memory usage. In this release we significantly increased the stability of these features and also stress tested on larger datasets. These improvements in stability can be found in BanyanArrays.jl v0.2.0 and BanyanDataFrames.jl v0.2.0.
January 23, 2022
Banyan.jl v0.2.4 and BanyanDataFrames.jl v0.1.5
In this release we focused on faster startup time, better scalability,
and supporting parameter tunning use-cases with BanyanArrays.map
.
- Quicker initial startup with sessions (10s - 30 min)
- "Session ready" notifications via email
compute
for computing futures and returning the value- Less precompilation overhead
- Better scalability
- Bug fixes
BanyanArrays.jl v0.1.7
Support For Any Data Type
BanyanArrays.jl now supports arbitrary data types and separately defined struct
s
and function
s. You may, for example, define struct
s and function
s in separate
include
d Julia files and then specify the paths to the files in code_files
when you start_session
.
Better Support for Parameter Tuning
With two new features, Banyan Julia becomes the perfect option for massively parallel parameter tuning use-cases (e.g., hyperparameter optimization, genetic algorithms, etc.) where a modern and expressive programming language like Julia is extremely valuable:
force_parallelism
when runningmap
convert(Banyan.Array, A::AbstractArray)
and -convert(Banyan.Array, df::AbstractDataFrame)
December 16, 2021
Announcing support for Banyan across North America, Latin America, and Europe! Specifically, we now support general usage of Banyan and launching clusters and jobs in AWS data centers in:
- us-east-1 (N. Virginia) and us-east-2 (Ohio)
- us-west-1 (N. California) and us-west-2 (Oregon)
- eu-central-1 (Frankfurt) and eu-west-2 (London)
- sa-east-1 (São Paulo)
November 29, 2021
In this release, we fixed a couple of minor issues with:
- Collecting samples of missing data
- Larger-than-memory datasets getting written in batches
- Type instability in many partitioning functions
- Creating clusters in regions other than US West 2
- Specifying the region of a cluster with
create_cluster
November 3, 2021
We are excited to unveil BanyanDataFrames.jl - the easiest way to unelash Julia on massive tabular datasets for efficient processing and analytics. Some highlights of this release are:
- BanyanDataFrames.jl v0.1.0
- Groupby-aggregations with the familiar DataFrames.jl split-apply-combine API
- Reading and writing CSV, Parquet, and Arrow datasets
- Example notebooks
- BanyanArrays.jl v0.1.3 and Banyan.jl v0.2.0
- Major bug fixes and increased stability
- Automatic remote installation and usage of packages being used locally
- The ability to wait for a cluster or job to become ready
- Simpler integration with S3
- Windows support
August 5, 2021
Banyan.jl v0.1.2
For this release, we fixed several issues with the way Banyan orchestrates clusters, jobs, and users. We also added more documentation.
BanyanArrays.jl v0.1.2
We improved documentation and began extensive testing of BanyanDataFrames.jl. So far, we have successfully tested distributed group-by aggregation of CSV data stored in Amazon S3 using familiar DataFrames.jl functions.
July 24, 2021
Banyan.jl v0.1.1
This release fixes several bugs that were identified during the develoment of BanyanArrays.jl.
BanyanArrays.jl v0.1.1
We are thrilled to announce the initial release of BanyanArrays.jl, a scalable version of the Julia standard library's arrays that has been annotated using Banyan.jl. This release includes support for (1) map-reduce computation and (2) reading from and writing to HDF5 datasets either from the Internet (e.g., hosted on GitHub) or in Amazon S3.
June 21, 2021
Banyan.jl v0.1.0
We are excited to announce the release of Banyan.jl, a new framework for annotating and offloading Julia code to run automatically parallelized on Banyan-managed clusters. Banyan.jl includes functionality for (1) annotation of Julia code, (2) offloading code to jobs, (3) tracking data dependencies with futures, (4) tracking data locations, and (5) managing clusters.