Model Deployment Introduction
Introduction to model deployment in HyperAI
After completing model training, you can deploy the model to a server or store it on a device to provide real-time model inference services. "Model Deployment (Serving)" is the server-side model inference functionality provided by HyperAI.
Deployment Modes
HyperAI model deployment supports two deployment modes:
- Custom Deployment (Recommended): Fully customize the service startup method by writing a
start.shstartup script - Traditional
predictor.pyapproach: Use the predefined framework provided by HyperAI
API Key Authentication
HyperAI model deployment supports secure authentication using API Keys. Compared to JWT Token authentication, API Keys offer:
- More fine-grained access control
- Support for independent key management and tracking
- Alignment with industry standard practices (such as OpenAI, HuggingFace, etc.)
You can create and manage API Keys in the model deployment settings page. For detailed information, please refer to API Key Management.
Custom Deployment Method (Recommended)
This is the simplest and most flexible deployment method. You only need to:
- Prepare model files
- Write a
start.shscript to start your service
Custom Deployment Requirements
-
Required files:
start.sh- Startup script, must ensure:- Listen on port 80
- Handle HTTP requests
- Model files and other dependency files
-
Optional files:
requirements.txt- For installing Python dependenciesconda-packages.txt- For installing Conda dependenciesdependencies.sh- For installing system dependencies.env- For setting environment variables
Example
You can use any framework (such as FastAPI, Flask, Gradio, etc.) to provide services. Here is a simple example using FastAPI:
# app.py
from fastapi import FastAPI
import uvicorn
app = FastAPI()
@app.get("/")
def predict():
return {"message": "Hello World"}# start.sh
#!/bin/bash
pip install fastapi uvicorn
uvicorn app:app --host 0.0.0.0 --port 80Data Binding
When creating a model deployment, you can bind one or more data directories. The data binding method is basically the same as data binding for model training, and you can choose from the following sources:
- Public datasets or models
- Personal private datasets or models
- Working directory of computing containers
- Data repositories uploaded via file upload
Data Binding Characteristics
Model deployment data binding differs from model training in the following ways:
- Read-only Binding: All data bindings are in read-only mode, with no write or modification operations allowed
- Multiple Directory Binding: Multiple data directories can be bound simultaneously to different mount points:
/openbayes/input/input0/openbayes/input/input1/openbayes/input/input2/openbayes/input/input3/openbayes/input/input4
- Working Directory Characteristics:
- Contents in the working directory (
/openbayes/home) are copied from the binding source when the deployment starts - Important note: Since the contents of the working directory will be lost after restart, it is recommended to place all necessary model files and dependencies in the bound data directories
- Contents in the working directory (
Selecting Binding Directories
The selection method for binding is similar to compute containers:
Version Management
Model deployment supports version management:
- Versions are independent of each other and can support different runtime environments, resource types, and deployment contents
- When a new version is deployed, the old version will be automatically taken offline
- Version numbers increment as numeric sequences
Detailed operations are introduced in Managing Model Deployments.
Traditional Deployment Method (predictor.py)
If you wish to use the predefined framework provided by HyperAI, you can choose this method.
-
Required files:
predictor.py- Model deployment script containing thePredictorclass- Model files
-
Optional files:
requirements.txt,conda-packages.txt- For installing dependenciesdependencies.sh- For installing system dependencies.env- For setting environment variables
Detailed writing rules are introduced in Writing Serving Services, and writing examples are available for reference in the openbayes-serving-examples model repository.