HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Rethinking Dilated Convolution for Real-time Semantic Segmentation

Roland Gao

Rethinking Dilated Convolution for Real-time Semantic Segmentation

Abstract

The field-of-view is an important metric when designing a model for semantic segmentation. To obtain a large field-of-view, previous approaches generally choose to rapidly downsample the resolution, usually with average poolings or stride 2 convolutions. We take a different approach by using dilated convolutions with large dilation rates throughout the backbone, allowing the backbone to easily tune its field-of-view by adjusting its dilation rates, and show that it's competitive with existing approaches. To effectively use the dilated convolution, we show a simple upper bound on the dilation rate in order to not leave gaps in between the convolutional weights, and design an SE-ResNeXt inspired block structure that uses two parallel $3\times 3$ convolutions with different dilation rates to preserve the local details. Manually tuning the dilation rates for every block can be difficult, so we also introduce a differentiable neural architecture search method that uses gradient descent to optimize the dilation rates. In addition, we propose a lightweight decoder that restores local information better than common alternatives. To demonstrate the effectiveness of our approach, our model RegSeg achieves competitive results on real-time Cityscapes and CamVid datasets. Using a T4 GPU with mixed precision, RegSeg achieves 78.3 mIOU on Cityscapes test set at $37$ FPS, and 80.9 mIOU on CamVid test set at $112$ FPS, both without ImageNet pretraining.

Code Repositories

RolandGao/RegSeg
Official
pytorch
Mentioned in GitHub
Deci-AI/super-gradients
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
real-time-semantic-segmentation-on-camvidRegSeg(Cityscapes-Pretrained)
Frame (fps): 70
Time (ms): 14
mIoU: 80.9
real-time-semantic-segmentation-on-cityscapesRegSeg (no ImageNet pretraining)
Frame (fps): 30
Time (ms): 33
mIoU: 78.3%

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Rethinking Dilated Convolution for Real-time Semantic Segmentation | Papers | HyperAI