Command Palette
Search for a command to run...
MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High-Intensity Surgical Environments

Abstract
Operating rooms (ORs) are complex, high-stakes environments requiring preciseunderstanding of interactions among medical staff, tools, and equipment forenhancing surgical assistance, situational awareness, and patient safety.Current datasets fall short in scale, realism and do not capture the multimodalnature of OR scenes, limiting progress in OR modeling. To this end, weintroduce MM-OR, a realistic and large-scale multimodal spatiotemporal ORdataset, and the first dataset to enable multimodal scene graph generation.MM-OR captures comprehensive OR scenes containing RGB-D data, detail views,audio, speech transcripts, robotic logs, and tracking data and is annotatedwith panoptic segmentations, semantic scene graphs, and downstream task labels.Further, we propose MM2SG, the first multimodal large vision-language model forscene graph generation, and through extensive experiments, demonstrate itsability to effectively leverage multimodal inputs. Together, MM-OR and MM2SGestablish a new benchmark for holistic OR understanding, and open the pathtowards multimodal scene analysis in complex, high-stakes environments. Ourcode, and data is available at https://github.com/egeozsoy/MM-OR.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| scene-graph-generation-on-4d-or | MM2SG | F1: 0.901 |
| scene-graph-generation-on-mm-or | MM2SG | Macro F1: 0.529 |
| video-panoptic-segmentation-on-4d-or | MM-OR-VPQ4 | VPQ: 69.8 |
| video-panoptic-segmentation-on-4d-or | MM-OR-VPQ8 | VPQ: 69.2 |
| video-panoptic-segmentation-on-mm-or | MM-OR-VPQ4 | VPQ: 67.0 |
| video-panoptic-segmentation-on-mm-or | MM-OR-VPQ8 | VPQ: 66.4 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.