Image Generation On Textatlaseval
评估指标
StyledTextSynth Clip Score
StyledTextSynth FID
StyledTextSynth OCR (Accuracy)
StyledTextSynth OCR (Cer)
StyledTextSynth OCR (F1 Score)
TextScenesHQ Clip Score
TextScenesHQ FID
TextScenesHQ OCR (Accuracy)
TextScenesHQ OCR (Cer)
TextScenesHQ OCR (F1 Score)
TextVisionBlend Clip Score
TextVisionBlend FID
TextVisionBlend OCR (Accuracy)
TextVisionBlend OCR (Cer)
TextVsionBlend OCR (F1 Score)
评测结果
各个模型在此基准测试上的表现结果
| Paper Title | Repository | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dalle3 | 0.2938 | 90.70 | 30.58 | 0.78 | 38.25 | 0.3367 | 86.73 | 69.26 | - | 51.63 | 0.1938 | 153.21 | 8.38 | 0.93 | 7.94 | - | - |
| Grok3 | 0.2938 | 80.33 | 15.82 | 0.73 | 21.40 | 0.3197 | - | 35.07 | 0.57 | 37.94 | 0.1697 | - | 41.54 | 0.57 | 44.22 | - | - |
| SD3.5 Large | 0.2849 | 71.09 | 27.21 | 0.73 | 33.86 | 0.2363 | 64.44 | 19.03 | 0.73 | 24.45 | 0.1846 | 118.85 | 14.55 | 0.88 | 16.25 | - | - |
| PixArt-Sigma | 0.2764 | 82.83 | 0.42 | 0.90 | 0.62 | 0.2347 | 72.62 | 0.34 | 0.91 | 0.53 | 0.1891 | 81.29 | 2.40 | 0.83 | 1.57 | PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation | |
| Infinity-2B | 0.2727 | 84.95 | 0.80 | 0.93 | 1.42 | 0.2346 | 71.59 | 1.06 | 0.88 | 1.74 | 0.1979 | 95.69 | 2.98 | 0.83 | 3.44 | Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data | |
| TextDiffuser2 | 0.2510 | 114.31 | 0.76 | 0.99 | 1.46 | 0.2252 | 84.10 | 0.66 | 0.96 | 1.25 | - | - | - | - | - | TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering | - |
| Anytext | 0.2501 | 117.71 | 0.35 | 0.98 | 0.66 | 0.2174 | 101.32 | 0.42 | 0.95 | 0.8 | - | - | - | - | - | AnyText: Multilingual Visual Text Generation And Editing |
0 of 7 row(s) selected.