HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High-Intensity Surgical Environments

MM-OR: A Large Multimodal Operating Room Dataset for Semantic
  Understanding of High-Intensity Surgical Environments

Abstract

Operating rooms (ORs) are complex, high-stakes environments requiring preciseunderstanding of interactions among medical staff, tools, and equipment forenhancing surgical assistance, situational awareness, and patient safety.Current datasets fall short in scale, realism and do not capture the multimodalnature of OR scenes, limiting progress in OR modeling. To this end, weintroduce MM-OR, a realistic and large-scale multimodal spatiotemporal ORdataset, and the first dataset to enable multimodal scene graph generation.MM-OR captures comprehensive OR scenes containing RGB-D data, detail views,audio, speech transcripts, robotic logs, and tracking data and is annotatedwith panoptic segmentations, semantic scene graphs, and downstream task labels.Further, we propose MM2SG, the first multimodal large vision-language model forscene graph generation, and through extensive experiments, demonstrate itsability to effectively leverage multimodal inputs. Together, MM-OR and MM2SGestablish a new benchmark for holistic OR understanding, and open the pathtowards multimodal scene analysis in complex, high-stakes environments. Ourcode, and data is available at https://github.com/egeozsoy/MM-OR.

Code Repositories

egeozsoy/MM-OR
Official
pytorch
Mentioned in GitHub

Benchmarks

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp