HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Hi-SAM: Marrying Segment Anything Model for Hierarchical Text Segmentation

Ye Maoyuan ; Zhang Jing ; Liu Juhua ; Liu Chenyu ; Yin Baocai ; Liu Cong ; Du Bo ; Tao Dacheng

Hi-SAM: Marrying Segment Anything Model for Hierarchical Text
  Segmentation

Abstract

The Segment Anything Model (SAM), a profound vision foundation modelpretrained on a large-scale dataset, breaks the boundaries of generalsegmentation and sparks various downstream applications. This paper introducesHi-SAM, a unified model leveraging SAM for hierarchical text segmentation.Hi-SAM excels in segmentation across four hierarchies, including pixel-leveltext, word, text-line, and paragraph, while realizing layout analysis as well.Specifically, we first turn SAM into a high-quality pixel-level textsegmentation (TS) model through a parameter-efficient fine-tuning approach. Weuse this TS model to iteratively generate the pixel-level text labels in asemi-automatical manner, unifying labels across the four text hierarchies inthe HierText dataset. Subsequently, with these complete labels, we launch theend-to-end trainable Hi-SAM based on the TS architecture with a customizedhierarchical mask decoder. During inference, Hi-SAM offers both automatic maskgeneration (AMG) mode and promptable segmentation (PS) mode. In the AMG mode,Hi-SAM segments pixel-level text foreground masks initially, then samplesforeground points for hierarchical text mask generation and achieves layoutanalysis in passing. As for the PS mode, Hi-SAM provides word, text-line, andparagraph masks with a single point click. Experimental results show thestate-of-the-art performance of our TS model: 84.86% fgIOU on Total-Text and88.96% fgIOU on TextSeg for pixel-level text segmentation. Moreover, comparedto the previous specialist for joint hierarchical detection and layout analysison HierText, Hi-SAM achieves significant improvements: 4.73% PQ and 5.39% F1 onthe text-line level, 5.49% PQ and 7.39% F1 on the paragraph level layoutanalysis, requiring $20\times$ fewer training epochs. The code is available athttps://github.com/ymy-k/Hi-SAM.

Code Repositories

ymy-k/hi-sam
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
hierarchical-text-segmentation-on-hiertextHi-SAM
F-score (average): 81.87
F-score (para., layout): 75.97
F-score (stroke): 83.36
F-score (text-line): 85.30
F-score (word): 82.86

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Hi-SAM: Marrying Segment Anything Model for Hierarchical Text Segmentation | Papers | HyperAI