HyperAI
HyperAI超神经
首页
算力平台
文档
资讯
论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
中文
HyperAI
HyperAI超神经
Toggle sidebar
全站搜索…
⌘
K
全站搜索…
⌘
K
首页
SOTA
动作识别
Action Recognition On Epic Kitchens 100
Action Recognition On Epic Kitchens 100
评估指标
Action@1
GFLOPs
Noun@1
Verb@1
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
Action@1
GFLOPs
Noun@1
Verb@1
Paper Title
Repository
Avion (ViT-L)
54.4
-
65.4
73.0
Training a Large Video Model on a Single Machine in a Day
M&M (WTS 60M)
53.6
-
66.3
72.0
M&M Mix: A Multimodal Multiview Transformer Ensemble
-
LVMAE
52.1
-
61.8
75.0
Extending Video Masked Autoencoders to 128 frames
-
TAdaFormer-L/14
51.8
-
64.1
71.7
Temporally-Adaptive Models for Efficient Video Understanding
LaViLa (TimeSformer-L)
51
-
62.9
72
Learning Video Representations from Large Language Models
MTV-B (WTS 60M)
50.5
-
63.9
69.9
Multiview Transformers for Video Recognition
OMNIVORE (Swin-B, finetuned)
49.9
-
61.7
69.5
Omnivore: A Single Model for Many Visual Modalities
CAST-B/16
49.3
-
60.9
72.5
CAST: Cross-Attention in Space and Time for Video Action Recognition
TAdaConvNeXtV2-S
48.9
-
60.2
71.0
Temporally-Adaptive Models for Efficient Video Understanding
MeMViT-24
48.4
-
60.3
71.4
MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition
MMT
47.8
-
61.0
70.1
Multiscale Multimodal Transformer for Multimodal Action Recognition
-
MoViNet-A6
47.7
117x1
57.3
72.2
MoViNets: Mobile Video Networks for Efficient Video Recognition
AVT
47.2
-
59.3
70.4
AVT: Audio-Video Transformer for Multimodal Action Recognition
-
ORViT Mformer-L (ORViT blocks)
45.7
-
58.7
68.4
Object-Region Video Transformers
TempAgg
45.26
-
53.35
66
Technical Report: Temporal Aggregate Representations
MoViNet-A5
44.5
74.9x1
55.1
69.1
MoViNets: Mobile Video Networks for Efficient Video Recognition
Mformer-HR
44.5
-
58.5
67.0
Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers
GSF
44.48
-
53.18
69.06
Gate-Shift-Fuse for Video Action Recognition
MoViNet-A4
44.4
42.2x1
56.2
68.8
MoViNets: Mobile Video Networks for Efficient Video Recognition
Mformer-L
44.1
-
57.6
67.1
Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers
0 of 30 row(s) selected.
Previous
Next
Action Recognition On Epic Kitchens 100 | SOTA | HyperAI超神经