HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding

Xue Le ; Gao Mingfei ; Xing Chen ; Martín-Martín Roberto ; Wu Jiajun ; Xiong Caiming ; Xu Ran ; Niebles Juan Carlos ; Savarese Silvio

ULIP: Learning a Unified Representation of Language, Images, and Point
  Clouds for 3D Understanding

Abstract

The recognition capabilities of current state-of-the-art 3D models arelimited by datasets with a small number of annotated data and a pre-defined setof categories. In its 2D counterpart, recent advances have shown that similarproblems can be significantly alleviated by employing knowledge from othermodalities, such as language. Inspired by this, leveraging multimodalinformation for 3D modality could be promising to improve 3D understandingunder the restricted data regime, but this line of research is not wellstudied. Therefore, we introduce ULIP to learn a unified representation ofimages, texts, and 3D point clouds by pre-training with object triplets fromthe three modalities. To overcome the shortage of training triplets, ULIPleverages a pre-trained vision-language model that has already learned a commonvisual and textual space by training with massive image-text pairs. Then, ULIPlearns a 3D representation space aligned with the common image-text space,using a small number of automatically synthesized triplets. ULIP is agnostic to3D backbone networks and can easily be integrated into any 3D architecture.Experiments show that ULIP effectively improves the performance of multiplerecent 3D backbones by simply pre-training them on ShapeNet55 using ourframework, achieving state-of-the-art performance in both standard 3Dclassification and zero-shot 3D classification on ModelNet40 and ScanObjectNN.ULIP also improves the performance of PointMLP by around 3% in 3Dclassification on ScanObjectNN, and outperforms PointCLIP by 28.8% on top-1accuracy for zero-shot 3D classification on ModelNet40. Our code andpre-trained models are released at https://github.com/salesforce/ULIP.

Code Repositories

salesforce/ulip
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
3d-point-cloud-classification-on-modelnet40ULIP + PointNet++(ssg)
Mean Accuracy: 91.2
Overall Accuracy: 93.4
3d-point-cloud-classification-on-modelnet40ULIP + PointMLP
Mean Accuracy: 92.4
Overall Accuracy: 94.7
3d-point-cloud-classification-on-modelnet40ULIP + PointBERT
Overall Accuracy: 94.1
3d-point-cloud-classification-on-scanobjectnnULIP + PointBERT
Overall Accuracy: 86.4
3d-point-cloud-classification-on-scanobjectnnULIP + PointMLP
Mean Accuracy: 88.5
Overall Accuracy: 89.4
3d-point-cloud-classification-on-scanobjectnnULIP + PointNeXt
Mean Accuracy: 88.6
Number of params: 1.4M
Overall Accuracy: 89.7
training-free-3d-point-cloud-classificationULIP
Accuracy (%): 60.4
Need 3D Data?: Yes
zero-shot-transfer-3d-point-cloudULIP + PointMLP
Accuracy (%): 61.5
zero-shot-transfer-3d-point-cloudULIP + PointBERT
Accuracy (%): 60.4

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding | Papers | HyperAI