Computing Container Data Binding

Bindable Content

When creating or restarting a container, you can choose to bind data into the container. Even massive datasets can be bound and accessed directly as a file system. Bindable content includes the following categories:

Public datasets
Personal private datasets
Working directories of public tutorials
Working directories of personal private executions
Upload new code

As shown in the figure below, on the container's "Data Binding" interface, all available data for binding will be listed, and you can also filter by fields such as name and ID.

Binding to Working Directory

Each time a container executes a task, an independent storage space is created and bound to the /openbayes/home directory, which is called its working directory. This directory is also pointed to the /openbayes/home directory via a soft link. After the execution is closed, the content in the working directory will be saved, which is the so-called execution "working directory".

Please note that binding to the working directory actually involves a data copying process. Therefore, the container's startup time may vary depending on the size of the bound data, and this will also occupy storage space in the working directory.

Binding to Data Directory

In addition to the working directory, you can also choose to bind data to the following root directories when creating a container:

/openbayes/input/input0
/openbayes/input/input1
/openbayes/input/input2
/openbayes/input/input3
/openbayes/input/input4

Data directory binding has two modes:

Read-write binding: allows you to add, update, and delete the bound data.
Read-only binding: you can only read the bound data and cannot add, update, or delete it.

Read-Write Binding

For datasets or models with read-write permissions, you can choose "read-write binding". In this mode, you can directly access the corresponding directory and update data. The following scenarios are suitable for read-write binding:

Preprocess uploaded raw datasets and delete unnecessary data.
Bind two datasets, extract partial data from one dataset, and save it to another dataset.
Create an empty dataset version, then save data from the container into it.
Create an empty dataset version, then use the rsync command to copy local data into it.

:::warning Note Only datasets or models with read-write permissions can be bound in read-write mode. For public datasets without read-write permissions and working directories of other executions, you can only perform "read-only" binding. :::

:::info Read-Write Binding Usage Recommendations

Large Dataset Processing
- When the data to be processed exceeds the workspace (/openbayes/home) capacity limit
- Create an empty dataset and bind it to the input directory in read-write mode
- Write processing results directly to the bound directory to avoid occupying workspace
Data Persistence
- For important data that needs long-term storage
- For data that needs to be shared among multiple containers
- It is recommended to create dedicated datasets for management rather than storing in the working directory
Performance Considerations
- Data with read-write binding will be written directly to the data warehouse without occupying workspace
- Suitable for large datasets that require frequent read and write operations
- Can avoid data synchronization overhead when containers are closed :::

Read-Only Binding

For public datasets, public models, and public tutorials created by others that you don't have write permission to, you can only perform "read-only binding". In this mode, you can only read data and cannot perform add, update, or delete operations.

As shown in the image above, for read-only bound data, the container creation page will display corresponding reminder information.

We support binding data from specified external Git repositories when creating executions, thus avoiding cumbersome operations such as downloading to local and then uploading. However, the following limitations apply:

The Git repository must be publicly accessible; private repositories such as GitHub private repos or repositories requiring HTTP authentication are not currently supported
When binding a Git repository, the corresponding code will be cloned directly to the /output path, therefore no additional data can be bound to /output when binding a Git repository -->