
摘要
对LiDAR点云进行密集标注成本高昂,这限制了全监督学习方法的可扩展性。在本工作中,我们研究了在LiDAR分割任务中尚待充分探索的半监督学习(Semi-Supervised Learning, SSL)方法。我们的核心思想是利用LiDAR点云所具有的强空间结构信息,更有效地挖掘未标注数据的潜在价值。为此,我们提出了LaserMix方法,该方法通过混合来自不同LiDAR扫描数据的激光束,并促使模型在混合前后对同一场景做出一致且置信度高的预测。所提出的框架具备三个显著优势:1)通用性强:LaserMix与LiDAR的表示形式无关(例如,范围视图(range view)或体素(voxel)表示),因此我们的半监督学习框架具有广泛的适用性;2)理论基础扎实:我们进行了详尽的分析,从理论上解释了所提框架的有效性与适用条件;3)实际效果显著:在多个主流LiDAR分割数据集(nuScenes、SemanticKITTI和ScribbleKITTI)上的全面实验验证了方法的有效性与优越性。值得注意的是,仅需全监督方法所需标注数据的1/2至1/5,我们的方法即可取得与之相当甚至更优的性能;同时,在监督学习基线基础上,平均性能提升达10.8%。我们期望这一简洁而高效的框架能够推动未来半监督LiDAR分割领域的研究发展。代码已公开可用。
代码仓库
yuan-zm/dgt-st
pytorch
GitHub 中提及
ldkong1205/LaserMix
官方
pytorch
GitHub 中提及
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| semi-supervised-semantic-segmentation-on-1 | LaserMix (DeepLab v3+, ImageNet pre- trained ResNet50, single scale inference) | Validation mIoU: 78.3% |
| semi-supervised-semantic-segmentation-on-2 | LaserMix (DeepLab v3+, ImageNet pre-trained ResNet50, single scale inference) | Validation mIoU: 77.1% |
| semi-supervised-semantic-segmentation-on-23 | LaserMix (Range View) | mIoU (1% Labels): 38.3 mIoU (10% Labels): 54.4 mIoU (20% Labels): 55.6 mIoU (50% Labels): 58.7 |
| semi-supervised-semantic-segmentation-on-23 | LaserMix (Voxel) | mIoU (1% Labels): 44.2 mIoU (10% Labels): 53.7 mIoU (20% Labels): 55.1 mIoU (50% Labels): 56.8 |
| semi-supervised-semantic-segmentation-on-24 | LaserMix (Voxel) | mIoU (1% Labels): 50.6 mIoU (10% Labels): 60.0 mIoU (20% Labels): 61.9 mIoU (50% Labels): 62.3 |
| semi-supervised-semantic-segmentation-on-24 | LaserMix (Range View) | mIoU (1% Labels): 43.4 mIoU (10% Labels): 58.8 mIoU (20% Labels): 59.4 mIoU (50% Labels): 61.4 |
| semi-supervised-semantic-segmentation-on-25 | LaserMix (Range View) | mIoU (1% Labels): 49.5 mIoU (10% Labels): 68.2 mIoU (20% Labels): 70.6 mIoU (50% Labels): 73.0 |
| semi-supervised-semantic-segmentation-on-25 | LaserMix (Voxel) | mIoU (1% Labels): 55.3 mIoU (10% Labels): 69.9 mIoU (20% Labels): 71.8 mIoU (50% Labels): 73.2 |
| semi-supervised-semantic-segmentation-on-8 | LaserMix (DeepLab v3+, ImageNet pre- trained ResNet50, single scale inference) | Validation mIoU: 79.1% |