Image Retrieval On Photochat
评估指标
R1
R@10
R@5
Sum(R@1,5,10)
评测结果
各个模型在此基准测试上的表现结果
| Paper Title | Repository | |||||
|---|---|---|---|---|---|---|
| PaCE | 15.2 | 49.6 | 36.7 | 101.5 | PaCE: Unified Multi-modal Dialogue Pre-training with Progressive and Compositional Experts | |
| VLMo | 11.5 | 39.4 | 30.0 | 83.2 | VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts | |
| ViLT | 11.5 | 25.6 | 33.8 | 71.0 | ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision | |
| SCAN | 10.4 | 37.1 | 27.0 | 74.5 | Stacked Cross Attention for Image-Text Matching | |
| DE++ | 9.0 | 35.7 | 26.4 | 71.1 | PhotoChat: A Human-Human Dialogue Dataset with Photo Sharing Behavior for Joint Image-Text Modeling | - |
0 of 5 row(s) selected.