HyperAIHyperAI

Command Palette

Search for a command to run...

One-click Deployment of Phi-3.5-vision-instruct

Model Introduction

Phi-3.5-vision-instruct is a multimodal model in the Phi-3.5 series released by Microsoft, designed for applications that process text and visual input. The model supports a context length of 128K and has undergone a rigorous fine-tuning and optimization process, making it suitable for widespread use in commercial and research fields in environments with limited memory or computing resources and high low-latency requirements. The Phi-3.5-vision-instruct model has extensive image understanding, optical character recognition (OCR), chart and table parsing, multi-image or video clip summarization, and other functions, making it very suitable for a variety of AI-driven applications. It shows significant performance improvements in benchmarks related to image and video processing. The architecture of the model includes a 4.2 billion parameter system that integrates an image encoder, connector, projector, and Phi-3 Mini language model. The training used 256 NVIDIA A100-80G GPUs, the training time was 6 days, and the training data included 500 billion tokens (visual and text).

The Phi-3.5-vision-instruct model scored 43.0 in Multimodal Multi-Image Understanding (MMMU), demonstrating its enhanced ability to handle complex image understanding tasks. In addition, the model was trained using high-quality educational data, synthetic data, and strictly screened public documents to ensure data quality and privacy.

This tutorial can be started using a single 4090 card.

How to run

1. 克隆并成功启动容器后,等待约 10s,将鼠标悬浮在「API 地址」处,拷贝链接到新标签页打开
2. 可以看到如下界面
3. 点击上传图片,选择模型,并输入问题,点击 Submit
4. 生成结果

Exchange and discussion

🖌️ If you see a high-quality project, please leave a message in the background to recommend it! In addition, we have also established a tutorial exchange group. Welcome friends to scan the QR code and remark [SD Tutorial] to join the group to discuss various technical issues and share application effects↓ 

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
One-click Deployment of Phi-3.5-vision-instruct | Tutorials | HyperAI