Skip to main content

Workstations FAQ

This document contains FAQs about workstations as well as other things that might be good to know for a user.

What is a workstation? When should I use it?

A workstation is a cloud-based development environment that runs within your Outerbounds deployment, providing you with full access to your compute resources and data while maintaining security and compliance.

Workstation use cases include:

  1. When it is convenient to run flows locally (without @kubernetes or argo-workflows involved), and there are complicated dependencies that are hard to replicate locally. This is especially useful if eventually (during production when deployed to argo-workflows) tasks will run in specialized docker images; in this case, you can use the task Docker image for the workstation to achieve a complete sync between your dev & prod environments.
  2. When moving a high volume of data across the wire. Since workstations are running in the cloud, they have much better network bandwidth from object storage, compared to a laptop.
  3. When dev environments have big resource requirements, such as needing lots of GPUs, RAM, or disk space that’s not available on your laptop.
  4. When you want a quick REPL/notebook for doing iterative data work, data cannot leave your cloud due to privacy or compliance reasons.
  5. For stricter control over IAM roles in dev environments.

What directories are persisted on a workstation across sessions?

Workstations persist the directory /home/ob-workspace. This means any files under this directory will be persisted across workstation hibernates and restarts. Data outside this directory will be lost once a workstation is hibernated.

Can I use a custom docker image for workstations?

Yes, the image selector dropdown on the Workstation Create form doubles up as an input field, where you can paste the URI of your Docker image, and your workstation will use that base image. You do not need to set up anything related to VSCode in your docker image.

Public ECR is the default if no registry is provided. Use a fully qualified name to provide a different registry. To use Docker hub, for example the URI would be docker.io/<my-registry>/<my-image>:<my-tag>.

How can I bake our own workstation images?

You have two options.

  1. Use one of the Outerbounds base workstation images and add your dependencies on top of them, or
  2. bake your docker image from scratch.

If you wish to use the Outerbounds docker image, you can use one of the following two images:

  1. 006988687827.dkr.ecr.us-west-2.amazonaws.com/obp-workstations/python:\<LATEST SEMANTIC TAG>
  2. 006988687827.dkr.ecr.us-west-2.amazonaws.com/obp-workstations/nvidia/cuda:\<LATEST SEMANTIC TAG>

Both of these docker images come with:

  1. A system-wide installation of python3.
  2. Dependency management tools like conda, mamba.
  3. Default user for all sessions “workstation-user”.
  4. The home directory for workstation-user set to /home/ob-workspace.
  5. The GitHub CLI (gh-cli) to work with GitHub.
  6. Additionally, the nvidia/cuda image contains necessary software for GPU task and workstation runtimes.

If you wish to bake docker images from scratch, keep in mind:

  1. Ensure that the HOME directory of your user in the docker image is set to /home/ob-workspace. This is the directory that’s persisted across hibernates and restarts. Note that you’d have to make sure that the user’s home directory (which would be /home/ob-workspace) also gets reflected in /etc/passwd.
  • If using a non-root user in your Dockerfile, use the useradd directive to add a new user and set their home: RUN useradd \-m \-d $USERHOME $USERNAME
  • If using root user (not recommended), you won’t be able to modify your HOME directory directly. You have to use something like the following to edit your home directory: RUN sed \-i 's|root:x:0:0:root:/root|root:x:0:0:root:/home/ob-workspace|' /etc/passwd
  1. You need at least one system-wide installation of python3 available in your docker image.
  2. Optional but recommended: pre-install tools like git/gh to make it easier for your users to authenticate into and work with git.

In both cases, make sure that the image in your Docker repository is configured to be pull-able from your Outerbounds deployment.

What are the things installed in my workstation by Outerbounds?

Outerbounds sets up the following things in your workstation.

  1. The Outerbounds CLI and the Outerbounds distribution of Metaflow python package. Both of these are installed with pip install outerbounds.
  2. Credentials to access your Outerbounds deployment, which are automatically renewed when they expire.

You do not need to start any VSCode processes on your workstation, they are set up by default.

What IDEs can I use with my workstation?

You can use VSCode or Cursor to access your Outerbounds workstation.

Can I SSH into my workstation?

As of today you cannot SSH into your workstation, however the functionality is on our roadmap. If you really need to use SSH with your workstation, you can use openssh-serve. Please reach out on Slack if you encounter issues or want to talk through longer-term solutions.

How can I set up git access on my workstation?

If you’re using the Outerbounds docker image, it comes with the gh CLI to help manage your login. You can type gh auth login which will kick off a process to get your authenticated with GitHub. When possible, if you’re using our desktop VSCode extension, we try to forward your git credentials to the workstation to minimize setup steps.

After the workstation has been created, what can I modify in the workstation?

Several fields of the workstation are modifiable, however, some of them can only be modified when the workstation is hibernating.

Mutable properties regardless of workstation state:

  1. Additional Users
  2. Auto-hibernation controls: whether its enabled, the inactivity threshold.

Properties that are only mutable when the workstation is hibernating:

  1. CPU, GPU, Memory, Shared Memory.
  2. Disk-size: You can only increase the disk size of your workstation.
  3. Base Image of the workstation
  4. The compute pool/instance type that’s used for running your workstation.

What qualifies as activity on a workstation for auto-hibernation

A workstation is considered “active” if one of the two things happen in a time window:

  1. A python process is running.
  2. A file on the workstation was changed.

I want to run a long-running process on the workstation, but won’t be logged in the entire time. Can I do that?

As a rule of thumb, we recommend that you run any long-running flows using argo-workflows, since it offers better reliability.

However, if for some reason this process you’d like to run it on your workstation, we recommend using tmux to run it so that it is not dependent on you being connected to the workstation.

Can I scp some files into my workstation from my local?

For smaller files (less than 100MB), you can just drag and drop the file to the workstation VSCode window. For larger files, we recommend using Metaflow’s S3 client to store your files in S3, and then pull it down from S3 into your workstation(s).

We are currently working on the functionality to directly transfer files to and from your workstation to your local machine.

Why isn’t my workstation coming up?

The typical scenarios that can cause a workstation to take time in coming up are:

  1. Lack of capacity: Outerbounds clusters only keep capacity that is necessary to service currently running jobs. This means that when you are trying to start your workstation, it can often take a little time to first provision capacity for your workstation. The time needed to provision capacity is usually variable depending on your cloud provider, the exact instance type and the availability of said instance type with your provider at any given time.
  2. Large docker image: If you’re using a large docker image, it can take some time to pull it from the configured repository.

In any case, if your workstation takes more than 5 minutes to start, don’t hesitate to reach out to Outerbounds for support!

Can I configure a default IAM role/GCP Service Principal to be used by all processes on my workstation?

By default, all processes on the workstation assume the task role of the currently selected perimeter.

If this isn’t sufficient, you can provision a new role assumable by the task role, and reach out to Outerbounds with this newly provisioned role, to make it the default on all workstations.

What docker image is used when I launch a task with kubernetes from my workstation?

Unless specified otherwise, the docker image used for a task created via --with kubernetes or argo-workflows uses the same image as the workstation. Users can customize this dynamically using @kubernetes(image="docker.io/path/to/my/image", ...) or --with kubernetes:image=docker.io/path/to/my/image.

Can I force the workstation to run on a specific instance type?

Yes, when creating the workstation (or updating it) you can select the compute pool that corresponds to your desired instance type, provided that compute pool is configured to allow workstations.

You can go to the compute pools view to see whether a pool supports workstations or not. You can change this property of a compute pool whenever you want.

Can a non-admin create a workstation for themselves, or edit it?

Currently, only admins have the control over creating and updating the workstations for all users on the platform.

We are working on the functionality where any non-admin user should be able to provision a workstation for themselves as well as modify its properties if needed.

How much access does a user get when they’re added as an additional user to my workstation?

Any user that’s added to your workstation has full access to the workstation, including your credentials for the Outerbounds deployment. Any flow they run would show up as having been launched by you, unless they explicitly set up their own Outerbounds config on that workstation.

For this reason, we discourage heavily using the additional user's functionality for use cases outside temporary debugging.

How do I check the CPU/Memory usage by my workstation?

On your workspaces page, you will find charts that show the CPU and memory utilization of your workstation. If you think your workstation is under or over-provisioned, you can always update the resources available to your workstation.

Are workstations tied to a specific perimeter?

No, workstations are not tied to a specific perimeter. Think about them as being “laptops in the cloud”. This means that any perimeter that you have access to, you will be able to access it on the workstation as well.

Are actions on the workstation audited?

The following actions are audited:

  1. Workstation creation
  2. Updating the specs of the workstation
  3. Hibernating the workstation
  4. Restarting the workstation

The activity log for a workstation is available in the workspaces view.

When does a workstation actively occupy cloud instances?

As long as the workstation is in a running state (regardless of whether you are currently working on it), it will occupy cloud resources, and subsequently also be counted as active usage by your cloud provider during billing.

We recommend enabling auto-hibernation on your workstation so that the workstation automatically hibernates when you’re not using it.

Is workstation usage included in our cost reports?

Yes, your cost reports include workstation usage.

Can I switch perimeters inside my workstation?

Yes, you can use outerbounds perimeter switch --id <perimeter name> to switch between perimeters on your workstation. You can also use the pre-installed Outerbounds VSCode extension on your workstation by clicking on the O icon in the extensions bar. You can then click on the perimeter you wish to switch to.

Can I use credentials associated with a perimeter outside flows on a workstation?

Yes, on AWS and Google Cloud, your perimeter credentials are available throughout the workstation environment. This means that any CLI tools that you use like aws-cli or gcloud, any scripts that you write, or notebooks that you author will all automatically use the credentials associated with your perimeter, regardless of whether you’re using metaflow in that script/notebook/cli.

When you switch perimeters, the credentials are automatically swapped out, and the same CLI commands, scripts, notebooks or flows would now use the credentials associated with the new perimeter that you just switched to.

My workstation has been stuck on “Opening remote” for a while now, what can I do to fix it?

Try restarting your workstation by hibernating it first and then restarting it from the Outerbounds UI. Also restart VSCode. This should resolve this issue 99% of the times. If you’re still stuck, please contact the Outerbounds team!

Can I use Jupyter Notebooks or Jupyter Lab on my workstation?

Yes, you can use jupyter lab or notebook from your workstation. To do so, follow the following steps:

  • Connect to your workstation using the Outerbounds VSCode extension for vscode desktop.
  • Install jupyterlab or jupyter notebook using pip.
    • pip install jupyterlab notebook
  • Launch jupyter from your workstation:
    • jupyter lab (for running jupyter lab)
    • jupyter notebook (for notebooks)
  • At this point, the jupyter notebook/lab will start running on the workstation’s localhost, and VSCode will automatically port forward it to your desktop/laptop. This means that you should now be able to access your lab or notebook instance directly from your browser.
    • Make sure to leave the workstation VSCode window open in the background while you work on your lab/notebook.
  • You can copy and paste the printed URL from the output of the command in your browser and you should be all set to use lab/notebook for your work!