HyperAIHyperAI

Introduction to Data Warehouse

:::note Previously, "Data Warehouse" was called "Dataset", which was more easily understood as labeled data in machine learning scenarios. In reality, besides labeled data, it can also store any other types of data, including code, trained model files, etc. To avoid this ambiguity and to prepare for the new "Model Deployment" feature, HyperAI has adjusted the original "Dataset" concept, renaming "Dataset" to "Data Warehouse". :::

Currently, there are two categories under Data Warehouse:

  • Dataset: All data except model-related content can be placed here
  • Model: Used to store model files, code used in conjunction with model files, etc.

Creating a Data Warehouse

The two types of data warehouses have separate entry points for creation.

Creating a Dataset

Creating a Model

:::caution Note As both are data warehouses, projects with the same name cannot appear under both "Model" and "Dataset". :::

Switching Data Warehouse Types

On the "Settings" page, you can switch the type of data warehouse:

Copying Between Data Warehouses

To facilitate user dataset management, besides allowing creating a working directory as a data warehouse version, it also allows recreating a subdirectory of a data warehouse as a data warehouse version:

As shown above, click "Copy current directory to dataset" in a directory of a data warehouse version to select a specified dataset, choosing either "Add to existing dataset" or "Create new dataset".

  • "Add to existing dataset" will add the current data warehouse's subdirectory to the selected existing dataset.
  • "Create new dataset" will create a new dataset version in the target dataset from the current data directory.

During the copying or creation process, the new dataset version will be marked as "Copying data" status. After copying is complete, the dataset version will be marked as "Processing complete" and ready for use.

Adding README.md File to Data Warehouse

Each model warehouse version can provide a file named README.md to provide some description of that model warehouse version. This file will be displayed on the model warehouse version page.

Making Data Warehouse Public

Created data warehouses are "Private Data Warehouse" by default. On the data warehouse's "Settings" page, you can set the entire data warehouse as a "Public Data Warehouse". All registered users can access this data warehouse through the URL.

:::note The number of "Public Data Warehouses" each person can create is limited. This limit can be viewed in "Resource Usage" - "Quota Limits" - "Public Datasets". :::