Command Palette
Search for a command to run...
Yu Ping ; Zhao Yang ; Li Chunyuan ; Yuan Junsong ; Chen Changyou

Abstract
Generating long-range skeleton-based human actions has been a challengingproblem since small deviations of one frame can cause a malformed actionsequence. Most existing methods borrow ideas from video generation, whichnaively treat skeleton nodes/joints as pixels of images without considering therich inter-frame and intra-frame structure information, leading to potentialdistorted actions. Graph convolutional networks (GCNs) is a promising way toleverage structure information to learn structure representations. However,directly adopting GCNs to tackle such continuous action sequences both inspatial and temporal spaces is challenging as the action graph could be huge.To overcome this issue, we propose a variant of GCNs to leverage the powerfulself-attention mechanism to adaptively sparsify a complete action graph in thetemporal space. Our method could dynamically attend to important past framesand construct a sparse graph to apply in the GCN framework, well-capturing thestructure information in action sequences. Extensive experimental resultsdemonstrate the superiority of our method on two standard human action datasetscompared with existing methods.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| human-action-generation-on-human3-6m | SA-GCN | MMDa: 0.146 MMDs: 0.134 |
| human-action-generation-on-ntu-rgb-d-2d | SA-GCN | MMDa (CS): 0.285 MMDa (CV): 0.316 MMDs (CS): 0.299 MMDs (CV): 0.335 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.