Search for a command to run...
Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation