6 months ago

Abstract

Recently, many benchmarks and datasets have been developed to evaluate Vision-Language Models (VLMs) using visual question answering (VQA) pairs, and models have shown significant accuracy improvements. However, these benchmarks rarely test the model's ability to accurately complete visual entailment, for instance, accepting or refuting a hypothesis based on the image. To address this, we propose COREVQA (Crowd Observations and Reasoning Entailment), a benchmark of 5608 image and synthetically generated true/false statement pairs, with images derived from the CrowdHuman dataset, to provoke visual entailment reasoning on challenging crowded images. Our results show that even the top-performing VLMs achieve accuracy below 80%, with other models performing substantially worse (39.98%-69.95%). This significant performance gap reveals key limitations in VLMs' ability to reason over certain types of image-question pairs in crowded scenes.

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

6 months ago

Visual Question Answering

Ishant Chintapatla Kazuma Choji Naaisha Agarwal Andrew Lin Hannah You Charles Duong et al

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

6 months ago

Visual Question Answering

Ishant Chintapatla Kazuma Choji Naaisha Agarwal Andrew Lin Hannah You Charles Duong et al

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

COREVQA: A Crowd Observation and Reasoning Entailment Visual Question Answering Benchmark

Ishant Chintapatla Kazuma Choji Naaisha Agarwal Andrew Lin Hannah You Charles Duong et al

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

COREVQA: A Crowd Observation and Reasoning Entailment Visual Question Answering Benchmark

Ishant Chintapatla Kazuma Choji Naaisha Agarwal Andrew Lin Hannah You Charles Duong et al

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

COREVQA: A Crowd Observation and Reasoning Entailment Visual Question Answering Benchmark

Ishant Chintapatla Kazuma Choji Naaisha Agarwal Andrew Lin Hannah You Charles Duong et al

Abstract

Build AI with AI

HyperAI Newsletters