6 months ago

Convolutional Neural Network

Action Recognition

Video Processing

Method/Architecture

Computer Vision

Heiko Neumann Wolfgang Mader Christian Jarvers Basavaraj Hampiholi

Abstract

Fine-grained temporal action segmentation in long,untrimmed RGB videos is a key topic in visual human-machine interaction. Recent temporal convolution basedapproaches either use encoder-decoder(ED) architecture ordilations with doubling factor in consecutive convolutionlayers to segment actions in videos. However ED networksoperate on low temporal resolution and the dilations in suc-cessive layers cause gridding artifacts problem. We proposedepthwise separable temporal convolution network (DS-TCN) that operates on full temporal resolution and with re-duced gridding effects. The basic component of DS-TCNis residual depthwise dilated block (RDDB). We explore thetrade-off between large kernels and small dilation rates us-ing RDDB. We show that our DS-TCN is capable of captur-ing long-term dependencies as well as local temporal cuesefficiently. Our evaluation on three benchmark datasets,GTEA, 50Salads, and Breakfast demonstrates that DS-TCNoutperforms the existing ED-TCN and dilation based TCNbaselines even with comparatively fewer parameters.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

6 months ago

Convolutional Neural Network

Action Recognition

Video Processing

Method/Architecture

Computer Vision

Heiko Neumann Wolfgang Mader Christian Jarvers Basavaraj Hampiholi

Abstract

Fine-grained temporal action segmentation in long,untrimmed RGB videos is a key topic in visual human-machine interaction. Recent temporal convolution basedapproaches either use encoder-decoder(ED) architecture ordilations with doubling factor in consecutive convolutionlayers to segment actions in videos. However ED networksoperate on low temporal resolution and the dilations in suc-cessive layers cause gridding artifacts problem. We proposedepthwise separable temporal convolution network (DS-TCN) that operates on full temporal resolution and with re-duced gridding effects. The basic component of DS-TCNis residual depthwise dilated block (RDDB). We explore thetrade-off between large kernels and small dilation rates us-ing RDDB. We show that our DS-TCN is capable of captur-ing long-term dependencies as well as local temporal cuesefficiently. Our evaluation on three benchmark datasets,GTEA, 50Salads, and Breakfast demonstrates that DS-TCNoutperforms the existing ED-TCN and dilation based TCNbaselines even with comparatively fewer parameters.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp