HyperAIHyperAI

Introduction to Compute Containers

Introduction to the usage of HyperAI compute containers

Execution Model of Containers

As a computing unit, compute containers can execute various computational tasks, including data preprocessing, machine learning model training, and inference on unlabeled data using existing models. Their composition includes the following key elements:

  1. Basic hardware configuration, currently including four major elements: CPU, GPU, memory, and storage, specified through compute type.
  2. Basic runtime environment, mainly selecting the desired deep learning framework and its supporting dependencies, specified through image. For specific dependency lists, refer to Runtime Environment.
  3. Required code and data, provided by binding data, binding the "working directory" of other container executions, or directly uploading code.

Each execution of a container will reallocate storage and save the data stored in that execution of the container. Therefore, each execution under a container is independent. Combined with tools such as custom parameters and Parameters, if used properly, this can greatly improve the reproducibility of machine learning experiments. However, without a good understanding of these concepts, it may lead to data being copied back and forth between executions, which not only slows down container execution but also greatly increases unnecessary data storage overhead.

Container Creation

Containers currently support two modes: "Python Script Execution" and "Jupyter Workspace". The default working directory for both is the /output directory in the system (a soft link is also set at /openbayes/home, meaning /openbayes/home and /output are the same directory).

A container can create multiple "executions", each of which is an independent container that can have independent compute configuration and image. After each "execution" is closed, the contents under its working directory /openbayes/home will be saved and can be viewed through the "Working Directory" tab on the page.

:::info When a container executes, its working directory is /openbayes/home, so references to other data repositories /openbayes/input/input0-4 need to use absolute paths, while uploaded code can be executed using relative paths.

For example, when creating a "Python Script Execution", if you upload a file named train.py that needs to read data from the /openbayes/input/input0 directory, you must use the absolute path /openbayes/input/input0/<file content> to reference it during execution. However, if you need to execute the file directly, you can use the relative path python train.py instead of python /openbayes/home/train.py. :::

Data Binding

See Container Data Binding.

Jupyter Workspace

Jupyter Workspace is an interactive runtime environment we developed based on JupyterLab. The first supported programming language is Python, which has now become the default working environment for many data scientists. By accessing computing containers through Jupyter Workspace, you can use computing resources just as you would in any other environment.

Jupyter Workspace supports two environments: Notebook and Lab. We support Lab by default. If you don't know how to use Jupyter Workspace yet, you can refer to its documentation or related Chinese translation materials. Here we won't exhaustively cover all aspects of using Jupyter Workspace, but rather emphasize several key features of Jupyter Workspace under HyperAI.

For more information, see Jupyter Workspace.

Continue Execution

Generally speaking, multiple executions under the same "container" will have significant commonalities in terms of business logic. To facilitate users creating new "executions" based on execution history, HyperAI currently provides a "Continue Execution" option.

HyperAI will do the following for us:

  1. Bind the data repositories that were bound in the previous "execution" to the same locations
  2. If the previous "execution" was a "Python Script Task", bind the same code as well
  3. Bind the "working directory" of the previous "execution" to the /openbayes/home directory

:::info In HyperAI, you can bind a computing container's "working directory" to a new container to achieve a "pipeline" effect. Here we are using the "working directory" from a previous "model training" as input for a "model inference" task. However, this usage pattern copies the entire "working directory" from the previous execution to the new container, which will double the storage used. Therefore, if you don't need to perform write operations on the previous execution's "working directory", it's recommended to bind it to the "input0-4" directories, which will link the data to the new container in read-only mode without generating additional data usage. :::

In addition to using "Continue Execution" on the execution page, you can also operate from the "Execution Records" page.

Scenarios Where Code is Modified After Selecting Continue Execution

"Continue Execution" is intended to facilitate users continuing previous training with unchanged code. Special attention is needed if code is updated in a "Continue Execution" scenario.

After clicking "Continue Execution", if you try to upload new code here, it may conflict with the code in the "working directory from the previous execution" that is currently bound. For example, in the previous execution, we already uploaded a file named main.py, which has been saved to the "working directory" of that previous execution. If you upload a modified file also named main.py, HyperAI will ignore this modification and keep the existing file.

Therefore, if you use "Continue Execution" and find that the executed code is inconsistent with your expectations, it may be because the uploaded code content was overwritten by the working directory bound to the previous container. If you want to avoid this situation, you can modify the binding directory of the default "Working Directory from Last Execution".

How to Accelerate Container Startup

Saving a large number of files in the container's working directory (/openbayes/home or /output, which are equivalent) will affect the container's startup speed, especially copying a large number of small files can be very time-consuming. When the container starts and enters the data copying process, the execution status will change to "Synchronizing Data" and display the corresponding synchronization speed.

You can create separate "Data Repositories" for data or models and bind them to /input0-4 through data binding to avoid the copying process. You can see how to create a new dataset version from the container's "Working Directory" in Container Working Directory - Create Working Directory as Dataset Version.

Set Notifications

Currently, HyperAI provides two notification channels: email notifications and SMS notifications. Email notifications are selected by default and cannot be disabled. Users can set SMS notifications according to their preferences.

Using Task and Jupyter Workspace Modes Together

Jupyter workspace mode is suitable for immediate file execution and modification, but its computational resource usage efficiency is not high - resources are often wasted during user editing and debugging. The Python script upload method executes Python code immediately after the container starts running, making efficient use of computational resources, but it is very troublesome to modify, requiring code to be re-uploaded with each update.

Therefore, it is recommended to first create a Jupyter workspace in low-computation mode (CPU computation), ensure the code can execute normally, then close the resources and download the "Working Directory". Then create a GPU-computation container in Python script execution mode, upload the downloaded code, and execute the script.

Currently, Jupyter workspace has the openbayes command-line tool built-in, making it very convenient to create tasks through the command-line tool in the Jupyter environment.

Converting .ipynb Files to .py Files

Select "File" - "Export Notebook As..." - "Export Notebook to Executable Script" to download the current ipynb file in py format to your local machine. Then drag it to the file directory on the left side of the Jupyter workspace to upload the file to the container again:

You can see that the code snippets from the ipynb file are now concatenated and saved in the .py file.

Creating Tasks Through Jupyter Workspace's Built-in Command-Line Tool

See the Create Python Script section.

Make Container Public

Containers are created as "Private Containers" by default. In the container's Settings interface, you can set the container as public. For security reasons, public containers only allow all registered users to see container executions that have been closed.

Container Termination

Containers can be terminated at any stage of execution, but please note that terminating a container may result in some data results not being synchronized successfully. Please confirm the integrity of the current data in the container's "Working Directory" tab before terminating the container.

Container Deletion

After a container finishes execution, it will automatically release the computing resources it occupied. However, typically, some files will be saved after the container completes execution for other uses. The working directory will occupy the user's storage resources. If you determine that the entire "container's" data is no longer needed, you can delete the entire container in the "Settings" tab of the "Container". After the container is deleted, all user storage resources occupied by the container will be released.

:::danger Warning This operation is very dangerous. Data deleted from the container cannot be recovered! :::