Command Palette
Search for a command to run...
{Shenglan Liu YuHan Wang Li Xu Jie Zhu Lianyu Hu Lin Feng Kaiyuan Liu Zhuben Dong Yunheng Li}
Abstract
Due to boundary ambiguity and over-segmentation issues, identifying all the frames in long untrimmed videos is still challenging. To address these problems, we present the Efficient Two-Step Network (ETSN) with two components. The first step of ETSN is Efficient Temporal Series Pyramid Networks (ETSPNet) that capture both local and global frame-level features and provide accurate predictions of segmentation boundaries. The second step is a novel unsupervised approach called Local Burr Suppression (LBS), which significantly reduces the over-segmentation errors. Our empirical evaluations on the benchmarks including 50Salads, GTEA and Breakfast dataset demonstrate that ETSN outperforms the current state-of-the-art methods by a large margin.
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| action-segmentation-on-50-salads-1 | ETSN | Acc: 82.0 Edit: 78.8 F1@10%: 85.2 F1@25%: 83.9 F1@50%: 75.4 |
| action-segmentation-on-breakfast-1 | ETSN | Acc: 67.8 Average F1: 66.4 Edit: 70.3 F1@10%: 74.0 F1@25%: 69.0 F1@50%: 56.2 |
| action-segmentation-on-gtea-1 | ETSN | Acc: 78.2 Edit: 86.2 F1@10%: 91.1 F1@25%: 90.0 F1@50%: 77.9 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.