Command Palette
Search for a command to run...
Depthwise Separable Temporal Convolutional Network for Action Segmentation
{Heiko Neumann Wolfgang Mader Christian Jarvers Basavaraj Hampiholi}
Abstract
Fine-grained temporal action segmentation in long,untrimmed RGB videos is a key topic in visual human-machine interaction. Recent temporal convolution basedapproaches either use encoder-decoder(ED) architecture ordilations with doubling factor in consecutive convolutionlayers to segment actions in videos. However ED networksoperate on low temporal resolution and the dilations in suc-cessive layers cause gridding artifacts problem. We proposedepthwise separable temporal convolution network (DS-TCN) that operates on full temporal resolution and with re-duced gridding effects. The basic component of DS-TCNis residual depthwise dilated block (RDDB). We explore thetrade-off between large kernels and small dilation rates us-ing RDDB. We show that our DS-TCN is capable of captur-ing long-term dependencies as well as local temporal cuesefficiently. Our evaluation on three benchmark datasets,GTEA, 50Salads, and Breakfast demonstrates that DS-TCNoutperforms the existing ED-TCN and dilation based TCNbaselines even with comparatively fewer parameters.
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| action-segmentation-on-50-salads-1 | DS-TCN | Acc: 80.0 Edit: 70.0 F1@10%: 77.0 F1@25%: 74.43 F1@50%: 65.78 |
| action-segmentation-on-breakfast-1 | DS-TCN | Acc: 70.75 Average F1: 59.6 Edit: 69.02 F1@10%: 67.70 F1@25%: 62.05 F1@50%: 49.18 |
| action-segmentation-on-gtea-1 | DS-TCN | Acc: 78.10 Edit: 84.05 F1@10%: 88.30 F1@25%: 85.44 F1@50%: 72.84 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.