
摘要
海洋障碍物检测技术的发展受到缺乏多样化数据集的制约,现有数据集难以充分反映复杂多变的海洋环境特征。为此,我们提出了首个面向海洋场景的全景障碍物检测基准数据集LaRS(Lakes, Rivers, and Seas),涵盖湖泊、河流与海洋等多种场景。本研究的主要贡献在于构建了目前同类数据集中覆盖范围最广的全新数据集,其在拍摄地点多样性、场景类型、障碍物类别以及采集条件方面均达到领先水平。LaRS包含超过4000帧具有像素级标注的关键帧,每帧前后还附带九帧时序帧,以支持时间纹理信息的利用,总计超过4万帧。每帧关键帧均标注了8类“物体”(thing)类别、3类“场景”(stuff)类别,以及19项全局场景属性。我们对27种语义分割与全景分割方法进行了实验评估,并总结了若干性能分析结果及未来研究方向。为实现客观、公正的评估,我们开发并部署了在线评估服务器。LaRS数据集、评估工具包及基准测试平台已公开发布,访问地址为:https://lojzezust.github.io/lars-dataset
代码仓库
lojzezust/lars_evaluator
GitHub 中提及
lojzezust/mmsegmentation-macvi
官方
pytorch
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| panoptic-segmentation-on-lars | Panoptic FPN (ResNet-101) | PQ: 38.7 |
| panoptic-segmentation-on-lars | Mask2Former (Swin-T) | PQ: 39.2 |
| panoptic-segmentation-on-lars | Panoptic Deeplab (ResNet-50) | PQ: 34.7 |
| panoptic-segmentation-on-lars | Mask2Former (ResNet-50) | PQ: 37.6 |
| panoptic-segmentation-on-lars | Mask2Former (ResNet-101) | PQ: 37.2 |
| panoptic-segmentation-on-lars | Mask2Former (Swin-B) | PQ: 41.7 |
| panoptic-segmentation-on-lars | Panoptic FPN (ResNet-50) | PQ: 40.1 |
| panoptic-segmentation-on-lars | MaX-DeepLab | PQ: 31.9 |
| semantic-segmentation-on-lars | BiSeNetv1 (ResNet-50) | F1: 42.8 Q: 39.4 mIoU: 92.2 μ: 73.3 |
| semantic-segmentation-on-lars | STDC1 | F1: 61.8 Q: 57.8 mIoU: 93.6 μ: 75.6 |
| semantic-segmentation-on-lars | STDC2 | F1: 64.3 Q: 60.8 mIoU: 94.5 μ: 76.5 |
| semantic-segmentation-on-lars | BiSeNetv2 | F1: 54.7 Q: 51.2 mIoU: 93.5 μ: 73.9 |
| semantic-segmentation-on-lars | IntCatchAI | F1: 44.9 Q: 20.5 mIoU: 45.6 μ: 62.4 |
| semantic-segmentation-on-lars | PointRend | F1: 65.4 Q: 62.1 mIoU: 94.9 μ: 77.5 |
| semantic-segmentation-on-lars | DeepLabv3 (ResNet-101) | F1: 66.1 Q: 62.9 mIoU: 95.2 μ: 77.5 |
| semantic-segmentation-on-lars | Segmenter (ViT-B) | F1: 55.2 Q: 52.6 mIoU: 95.1 μ: 72.2 |
| semantic-segmentation-on-lars | WODIS (ResNet-101) | F1: 47.5 Q: 40.7 mIoU: 85.7 μ: 63.0 |
| semantic-segmentation-on-lars | SegFormer (MiT-B2) | F1: 70.0 Q: 67.8 mIoU: 96.8 μ: 78.6 |
| semantic-segmentation-on-lars | KNet (Swin-T) | F1: 73.4 Q: 71.3 mIoU: 97.2 μ: 78.8 |
| semantic-segmentation-on-lars | UNet | F1: 15.4 Q: 13.9 mIoU: 90.1 μ: 75.7 |
| semantic-segmentation-on-lars | DeepLabv3+ (ResNet-101) | F1: 64.0 Q: 61.0 mIoU: 95.4 μ: 77.8 |
| semantic-segmentation-on-lars | FCN (ResNet-50) | F1: 57.9 Q: 53.6 mIoU: 92.6 μ: 76.8 |
| semantic-segmentation-on-lars | WaSR (ResNet-101) | F1: 61.6 Q: 59.5 mIoU: 96.6 μ: 71.0 |
| semantic-segmentation-on-lars | FCN (ResNet-101) | F1: 63.4 Q: 60.2 mIoU: 95.0 μ: 77.4 |
| video-semantic-segmentation-on-lars | TMANet (ResNet-50) | F1: 61.1 Q: 57.5 mIoU: 94.1 μ: 77.1 |
| video-semantic-segmentation-on-lars | CSANet (ResNet-101) | F1: 52.1 Q: 49.1 mIoU: 94.2 μ: 63.7 |
| video-semantic-segmentation-on-lars | WaSR-T (ResNet-101) | F1: 62.1 Q: 60.1 mIoU: 96.7 μ: 71.1 |