Skip to content

Getting Started

Ready to start using Banyan to process large-scale datasets with automatic instant data sampling?

Getting started with Banyan is easy! Follow this tutorial to set up your account with Banyan, view the Banyan dashboard, and install the Julia client library and/or the Python client library. Then, you can create an auto-scaling cluster and start running sessions , processing numerical data, tabular data, images, and running ML models and custom scripts.

Setting up Your Account

Follow the steps in this section to set up your Banyan account so your team can start running computing sessions on auto-scaling clusters in the cloud.

If your organization already has a Banyan account, you can skip the steps in this section and simply join as a member and then learn how to use the dashboard, the Julia client library, and the Python client library.

Your AWS Account

Banyan uses AWS to manage underlying compute resources. If you don't already have an AWS account, please follow the instructions here to create an AWS account.

Your Banyan Account

  1. Create a new Banyan account here.
  2. Fill out the fields with your information. If you plan to use this account as an individual, you can use some unique name for the "Organization Name", such as your full name. Page to sign up for a Banyan account
  3. A temporary password will be sent to your email address. Check your email for a temporary password for Banyan from no-reply@verificationemail.com. Page indicating that login details were sent to provided email
  4. Please login here with your username and the temporary password in the email. You will be redirected to set your password. Page to sign into the Banyan dashboard
  5. Enter in your new password. Page to set/reset the account password for Banyan
  6. You can now sign into the Banyan dashboard here!

Connecting Your Banyan and AWS Accounts

To use Banyan, you must first grant Banyan sufficient permissions to create and manage compute resources in your AWS account. Banyan uses a cross-account role to connect your Banyan account to your AWS account. You must configure settings both in the AWS Management Console and in the Bayan dashboard. Once you have configured your AWS account with Banyan, you cannot asssociate your Banyan account with a different AWS account.

  1. Sign into your Banyan account and go to the Banyan Dashboard.
  2. Click on Account and scroll down to the Connect to AWS section.
  3. Copy the External ID for later use in Step 2, and do not close this page.
  4. Sign into the AWS Management Console and go to the IAM service. Searching for IAM page on the AWS console
  5. Click the Policies tab in the sidebar.
  6. Click Create Policy to create a new IAM policy.

    a. Click the JSON tab.
    b. Paste the following policy into the editor.

    View BanyanAccessPolicy

    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "BanyanGeneral",
                "Effect": "Allow",
                "Action": [
                    "apigateway:*",
                    "autoscaling:DescribeAutoScalingGroups",
                    "autoscaling:DescribeAutoScalingInstances",
                    "autoscaling:CreateAutoScalingGroup",
                    "autoscaling:PutNotificationConfiguration",
                    "autoscaling:UpdateAutoScalingGroup",
                    "autoscaling:PutScalingPolicy",
                    "autoscaling:DescribeScalingActivities",
                    "autoscaling:DeleteAutoScalingGroup",
                    "autoscaling:DeletePolicy",
                    "autoscaling:DisableMetricsCollection",
                    "autoscaling:EnableMetricsCollection",
                    "autoscaling:DescribeTags",
                    "autoscaling:SetDesiredCapacity",
                    "autoscaling:SetInstanceHealth",
                    "autoscaling:TerminateInstanceInAutoScalingGroup",
                    "ce:GetCostAndUsage",
                    "cloudformation:DescribeStackEvents",
                    "cloudformation:DescribeStackResource",
                    "cloudformation:DescribeStackResources",
                    "cloudformation:DescribeStacks",
                    "cloudformation:ListStacks",
                    "cloudformation:GetTemplate",
                    "cloudformation:CreateChangeSet",
                    "cloudformation:CreateStack",
                    "cloudformation:DeleteStack",
                    "cloudformation:UpdateStack",
                    "cloudwatch:PutDashboard",
                    "cloudwatch:ListDashboards",
                    "cloudwatch:DeleteDashboards",
                    "cloudwatch:GetDashboard",
                    "dynamodb:DescribeTable",
                    "dynamodb:ListTagsOfResource",
                    "dynamodb:CreateTable",
                    "dynamodb:DeleteTable",
                    "dynamodb:GetItem",
                    "dynamodb:PutItem",
                    "dynamodb:Query",
                    "dynamodb:TagResource",
                    "dynamodb:DeleteItem",
                    "dynamodb:ListTables",
                    "ec2:CreateLaunchTemplate",
                    "ec2:CreateLaunchTemplateVersion",
                    "ec2:ModifyLaunchTemplate",
                    "ec2:DeleteLaunchTemplate",
                    "ec2:DescribeLaunchTemplates",
                    "ec2:DescribeLaunchTemplateVersions",
                    "ec2:DescribeKeyPairs",
                    "ec2:DescribeRegions",
                    "ec2:DescribeVpcs",
                    "ec2:DescribeSubnets",
                    "ec2:DescribeSecurityGroups",
                    "ec2:DescribePlacementGroups",
                    "ec2:DescribeImages",
                    "ec2:DescribeInstances",
                    "ec2:DescribeInstanceStatus",
                    "ec2:DescribeInstanceTypes",
                    "ec2:DescribeInstanceTypeOfferings",
                    "ec2:DescribeSnapshots",
                    "ec2:DescribeVolumes",
                    "ec2:DescribeVpcAttribute",
                    "ec2:DescribeAddresses",
                    "ec2:CreateTags",
                    "ec2:DescribeNetworkInterfaces",
                    "ec2:DescribeAvailabilityZones",
                    "ec2:CreateVpc",
                    "ec2:ModifyVpcAttribute",
                    "ec2:DescribeNatGateways",
                    "ec2:CreateNatGateway",
                    "ec2:DescribeInternetGateways",
                    "ec2:CreateInternetGateway",
                    "ec2:AttachInternetGateway",
                    "ec2:DescribeRouteTables",
                    "ec2:CreateRoute",
                    "ec2:CreateRouteTable",
                    "ec2:AssociateRouteTable",
                    "ec2:CreateSubnet",
                    "ec2:ModifySubnetAttribute",
                    "ec2:CreateVolume",
                    "ec2:RunInstances",
                    "ec2:AllocateAddress",
                    "ec2:AssociateAddress",
                    "ec2:AttachNetworkInterface",
                    "ec2:AuthorizeSecurityGroupEgress",
                    "ec2:AuthorizeSecurityGroupIngress",
                    "ec2:CreateNetworkInterface",
                    "ec2:CreateSecurityGroup",
                    "ec2:ModifyVolumeAttribute",
                    "ec2:ModifyNetworkInterfaceAttribute",
                    "ec2:DeleteNetworkInterface",
                    "ec2:DeleteVolume",
                    "ec2:TerminateInstances",
                    "ec2:DeleteSecurityGroup",
                    "ec2:DisassociateAddress",
                    "ec2:RevokeSecurityGroupIngress",
                    "ec2:RevokeSecurityGroupEgress",
                    "ec2:ReleaseAddress",
                    "ec2:CreatePlacementGroup",
                    "ec2:DeletePlacementGroup",
                    "ec2:DescribeNetworkInterfaceAttribute",
                    "ec2:AttachVolume",
                    "ec2:DescribeInstanceAttribute",
                    "ec2messages:AcknowledgeMessage",
                    "ec2messages:DeleteMessage",
                    "ec2messages:FailMessage",
                    "ec2messages:GetEndpoint",
                    "ec2messages:GetMessages",
                    "ec2messages:SendReply",
                    "ecr:CreateRepository",
                    "ecr:DeleteRepository",
                    "ecr:DescribeRepositories",
                    "ecr:*",
                    "elasticfilesystem:*",
                    "execute-api:Invoke",
                    "fsx:*",
                    "iam:DeleteRole",
                    "iam:CreateInstanceProfile",
                    "iam:DeleteInstanceProfile",
                    "iam:AddRoleToInstanceProfile",
                    "iam:RemoveRoleFromInstanceProfile",
                    "iam:CreatePolicy",
                    "iam:DeletePolicy",
                    "iam:GetRolePolicy",
                    "iam:GetPolicy",
                    "iam:ListAttachedRolePolicies",
                    "iam:ListPolicyVersions",
                    "imagebuilder:*",
                    "lambda:CreateFunction",
                    "lambda:DeleteFunction",
                    "lambda:GetFunctionConfiguration",
                    "lambda:GetFunction",
                    "lambda:InvokeFunction",
                    "lambda:AddPermission",
                    "lambda:RemovePermission",
                    "logs:DeleteLogGroup",
                    "logs:PutRetentionPolicy",
                    "logs:DescribeLogGroups",
                    "logs:CreateLogGroup",
                    "route53:ChangeResourceRecordSets",
                    "route53:ChangeTagsForResource",
                    "route53:CreateHostedZone",
                    "route53:DeleteHostedZone",
                    "route53:GetChange",
                    "route53:GetHostedZone",
                    "route53:ListResourceRecordSets",
                    "route53:ListQueryLoggingConfigs",
                    "s3:*",
                    "sns:ListTopics",
                    "sns:GetTopicAttributes",
                    "sns:CreateTopic",
                    "sns:Subscribe",
                    "sns:Unsubscribe",
                    "sns:DeleteTopic",
                    "SNS:Publish",
                    "sqs:GetQueueAttributes",
                    "sqs:CreateQueue",
                    "sqs:DeleteMessage",
                    "sqs:DeleteQueue",
                    "sqs:GetQueueUrl",
                    "sqs:ListQueues",
                    "sqs:ReceiveMessage",
                    "sqs:SendMessage",
                    "sqs:SetQueueAttributes",
                    "sqs:TagQueue",
                    "sqs:ChangeMessageVisibility",
                    "ssm:*",
                    "ssmmessages:CreateControlChannel",
                    "ssmmessages:CreateDataChannel",
                    "ssmmessages:OpenControlChannel",
                    "ssmmessages:OpenDataChannel"
                ],
                "Resource": [
                    "*"
                ]
            },
            {
                "Sid": "IAMRole",
                "Effect": "Allow",
                "Action": [
                    "iam:AttachRolePolicy",
                    "iam:DetachRolePolicy",
                    "iam:PassRole",
                    "iam:CreateRole",
                    "iam:CreateServiceLinkedRole",
                    "iam:DeleteRole",
                    "iam:GetRole",
                    "iam:TagRole",
                    "iam:SimulatePrincipalPolicy",
                    "iam:PutRolePolicy",
                    "iam:DeleteRolePolicy"
                ],
                "Resource": [
                    "arn:aws:iam::*:role/parallelcluster*",
                    "arn:aws:iam::*:role/aws-service-role/*",
                    "arn:aws:iam::*:role/*banyan-pcluster3-stack*"
                ]
            },
            {
                "Condition": {
                    "ArnLike": {
                        "iam:PolicyARN": [
                            "arn:aws:iam::*:policy/parallelcluster*",
                            "arn:aws:iam::*:policy/parallelcluster/*",
                            "arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy",
                            "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore",
                            "arn:aws:iam::aws:policy/AWSBatchFullAccess",
                            "arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess",
                            "arn:aws:iam::aws:policy/service-role/AWSBatchServiceRole",
                            "arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceforEC2Role",
                            "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy",
                            "arn:aws:iam::aws:policy/service-role/AmazonEC2SpotFleetTaggingRole",
                            "arn:aws:iam::aws:policy/EC2InstanceProfileForImageBuilder",
                            "arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole",
                            "arn:aws:iam::*:policy/Banyan*",
                            "arn:aws:iam::*:policy/banyan-pcluster3-stack*"
                        ]
                    }
                },
                "Action": [
                    "iam:AttachRolePolicy",
                    "iam:DetachRolePolicy"
                ],
                "Resource": [
                    "arn:aws:iam::*:role/parallelcluster/*"
                ],
                "Effect": "Allow",
                "Sid": "IamPolicy"
            }
        ]
    }
    

    Creating a new AWS IAM policy

    c. Click Next: Tags, then Next: Review.
    d. In the Name field, enter BanyanAccessPolicy.
    Naming the AWS IAM policy e. Scroll down and click Create Policy.

  7. Return to the IAM service in the AWS Management Console.

  8. Click on the Roles tab in the sidebar.
  9. Select Create Role to create a new IAM role. a. Under Select type of trusted entity, select Another AWS account.
    b. In the Account ID field, enter the Banyan AWS account ID 814618936233
    c. Select Require external ID.
    d. In the External ID field, enter the external ID you acquired from the Banyan Dashboard in the previous step.
    Creating an AWS IAM role e. Click Next: Permissions.
    f. Search for BanyanAccessPolicy, and select it.
    Adding the previously created policy to the AWS IAM role g. Click Next: Tags.
    h. Click Next: Review.
    i. In the Role name field, enter BanyanAccessRole.
    j. Click Create role to create the role.
  10. Search for the role you just created, click on it, and copy the Role ARN. AWS IAM role arn
  11. Return to the Account page you copied the external ID from in the Banyan dashboard in the first step
  12. Enter in role ARN you copied in the previous step. Section on Banyan dashboard to connect to AWS
  13. Click Connect.
  14. The form should now be grayed out. Your Banyan account is now connected with your AWS account!

Billing

To use your Banyan account, you must first set up billing here. See here for more details on pricing.

  1. Sign into your Banyan account and go to the Banyan Dashboard.
  2. Click on Billing.
  3. Enter in the details of the credit card you would like to use for billing. An invoice will be automatically billed to this credit card monthly. Section on Banyan dashboard to enter in billing info
  4. You may change the credit card that you use for billing by filling out the form again.
    Section on Banyan dashboard to view and update billing info

Use the Dashboard

Visit banyancomputing.com/dashboard to:

  • Manage auto-scaling clusters that run in the cloud and provide the compute power for running computing sessions at any scale.
  • View computing sessions (info, logs, etc.) both past and currently running.
  • View and manage members of your organization's account.
  • Manage your account and billing.

Find out how to create and manage clusters using the dashboard.

Use a Client Library

You use the Julia client library or the Python client library to create "sessions". A session is essentially a connection to a cluster that lets you process data on the cluster. Once a session is started, you may use functions in BanyanArrays.jl or BanyanDataFrames.jl to process data directly on the cluster. Computing is automatically parallelized across nodes that make up the cluster.

Creating a new project with Banyan

You can either create a new Julia project or a new Python project. You can use both Julia and Python to start/manage sessions and run code on the same clusters.

Creating a new Julia project

  1. Download and install Julia.
  2. Download and set up up VS Code with the Julia extension. (optional)
  3. Create a Julia file containing code{:target="blank"}, a Jupyter notebook, or a Julia package. (optional)_
  4. Add the Banyan.jl package with ] add Banyan or import Pkg; Pkg.add("Banyan").
  5. Add the BanyanArrays.jl (for scalable array processing) or BanyanDataFrames.jl (for scalable data analytics) packages. (optional)
  6. Follow the steps below to authenticate with Banyan and with AWS when running Banyan Julia scripts
  7. Start sessions (cluster sessions) to run code with Banyan

Creating a new Python project

  1. Download and install Python >=3.8.
  2. Install Poetry to create and manage Python projects.
  3. Create a new Poetry project and add a Python file containing code.
  4. Add the banyan-python package to the Poetry project with poetry add banyan-python and import with import banyan as bn.
  5. Follow the steps below to authenticate with Banyan and with AWS when running Banyan Python scripts
  6. Start sessions (cluster sessions) to run code with Banyan

Authenticating with AWS

To use the client library, you must provide AWS credentials. The client library uses AWS to communicate with the clusters and to manage your data. Your clusters and your data live in the AWS account that you connected with your Banyan account. Provide AWS credentials in the same way you would provide credentials when using the AWS CLI (command-line interface).

You can find your credentials on the AWS console and provide them using the credentials file, config file, or environment variables. Follow the instructions here to authenticate with AWS. See here for information on how to set environment variables to authenticate with AWS in the same terminal/command prompt session where you are running julia.

When creating your access keys, ensure that the user has at least the following permissions:

  • s3:*
  • sqs:*

To SSH into a cluster from the CLI, ensure that you have the permissions described here. Read here for more information on how to connect to a cluster with SSH.

Authenticating with Banyan

When using the Banyan Julia client library, you need to authenticate with Banyan by providing your API key and your user ID. Both of these can be found on the dashboard as described below.

Managing API Keys

To use the client library, you must first generate an API key on the dashboard. Navigate to the Account tab . If you at any time wish to generate a new API key, you can regenerate a new API key. Note that if you regenerate the API key, all current running sessions will be invalid, so be sure to destroy all current running sessions before generating a new key.

Providing Banyan Credentials

To use the client library, you must provide Banyan credentials: your user ID and your API key. To find your user ID and API key, visit the dashboard and go to the Account tab. The latest credentials you provide are saved in $HOME/.banyan/banyanconfig.toml and retrieved for later use if unspecified.

Banyan checks for credentials in three places in the following order.

  1. Function arguments. ALl functions for cluster and session management accept optional inputs for authenticating with Banyan. These optional arguments are:

    • user_id of String
    • api_key of String
  2. Environment variables. Set the BANYAN_USER_ID and BANYAN_API_KEY environment variables.

    export BANYAN_USER_ID=your_user_id
    export BANYAN_API_KEY=your_api_key
    
  3. Configuration file. Set credentials in the Banyan configuration file on your local system, located at $HOME/.banyan/banyanconfig.toml. Do not modify this file directly; instead, use one of the two options above to automatically update this file. This file should contain the configuration information in the following format:

    [banyan]
    api_key = "1234567890abcdefghijklmnopqrstuv"
    user_id = "98876543210zyxwvutsrqponmlkjihgf"
    
    [aws]
    ec2_key_pair_name = "MySSHKeyPair"
    region = "us-west-2"
    

As a best practice, do not hardcode credentials in your code. To configure your credentials for the first time, use configure as follows:

julia> using Banyan
julia> configure(user_id="98876543210zyxwvutsrqponmlkjihgf", api_key="1234567890abcdefghijklmnopqrstuv)
>>> import banyan as bn
>>> bn.configure(user_id="98876543210zyxwvutsrqponmlkjihgf", api_key="1234567890abcdefghijklmnopqrstuv)

What's Next?

After you've followed the above steps you should have the following set up:

  1. Your Banyan account connected to an AWS account
  2. You can access the Banyan dashboard
  3. You can use the Julia client library and/or the Python client library for Banyan
    1. Authenticated with AWS
    2. Authenticated with Banyan
    3. Can create a cluster
    4. Can connect to a cluster by starting a session

Once you have created a cluster and a session, you may use use functions in BanyanArrays.jl or BanyanDataFrames.jl or other Banyan libraries to read, write, and process data directly on the cluster. You may also run custom scripts on the cluster. Once you're finished using the cluster, you should end the session and optionally also destroy the cluster.