8 months ago

Fangyu Lei Jixuan Chen Yuxiao Ye Ruisheng Cao Dongchan Shin Hongjin Su Zhaoqing Suo Hongcheng Gao Wenjing Hu Pengcheng Yin

Abstract

Real-world enterprise text-to-SQL workflows often involve complex cloud orlocal data across various database systems, multiple SQL queries in variousdialects, and diverse operations from data transformation to analytics. Weintroduce Spider 2.0, an evaluation framework comprising 632 real-worldtext-to-SQL workflow problems derived from enterprise-level database use cases.The databases in Spider 2.0 are sourced from real data applications, oftencontaining over 1,000 columns and stored in local or cloud database systemssuch as BigQuery and Snowflake. We show that solving problems in Spider 2.0frequently requires understanding and searching through database metadata,dialect documentation, and even project-level codebases. This challenge callsfor models to interact with complex SQL workflow environments, processextremely long contexts, perform intricate reasoning, and generate multiple SQLqueries with diverse operations, often exceeding 100 lines, which goes farbeyond traditional text-to-SQL challenges. Our evaluations indicate that basedon o1-preview, our code agent framework successfully solves only 21.3% of thetasks, compared with 91.2% on Spider 1.0 and 73.0% on BIRD. Our results onSpider 2.0 show that while language models have demonstrated remarkableperformance in code generation -- especially in prior text-to-SQL benchmarks --they require significant improvement in order to achieve adequate performancefor real-world enterprise usage. Progress on Spider 2.0 represents crucialsteps towards developing intelligent, autonomous, code agents for real-worldenterprise settings. Our code, baseline models, and data are available athttps://spider2-sql.github.io

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

8 months ago

Document Understanding

Code Generation

LLM

Method/Architecture

Natural Language Processing

Task/Problem

Fangyu Lei Jixuan Chen Yuxiao Ye Ruisheng Cao Dongchan Shin Hongjin Su Zhaoqing Suo Hongcheng Gao Wenjing Hu Pengcheng Yin

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

8 months ago

Document Understanding

Code Generation

LLM

Method/Architecture

Natural Language Processing

Task/Problem

Fangyu Lei Jixuan Chen Yuxiao Ye Ruisheng Cao Dongchan Shin Hongjin Su Zhaoqing Suo Hongcheng Gao Wenjing Hu Pengcheng Yin

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows

Fangyu Lei Jixuan Chen Yuxiao Ye Ruisheng Cao Dongchan Shin Hongjin Su Zhaoqing Suo Hongcheng Gao Wenjing Hu Pengcheng Yin6 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows

Fangyu Lei Jixuan Chen Yuxiao Ye Ruisheng Cao Dongchan Shin Hongjin Su Zhaoqing Suo Hongcheng Gao Wenjing Hu Pengcheng Yin6 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows

Fangyu Lei Jixuan Chen Yuxiao Ye Ruisheng Cao Dongchan Shin Hongjin Su Zhaoqing Suo Hongcheng Gao Wenjing Hu Pengcheng Yin6 more

Abstract

Build AI with AI

HyperAI Newsletters

Fangyu Lei Jixuan Chen Yuxiao Ye Ruisheng Cao Dongchan Shin Hongjin Su Zhaoqing Suo Hongcheng Gao Wenjing Hu Pengcheng Yin

Fangyu Lei Jixuan Chen Yuxiao Ye Ruisheng Cao Dongchan Shin Hongjin Su Zhaoqing Suo Hongcheng Gao Wenjing Hu Pengcheng Yin

Fangyu Lei Jixuan Chen Yuxiao Ye Ruisheng Cao Dongchan Shin Hongjin Su Zhaoqing Suo Hongcheng Gao Wenjing Hu Pengcheng Yin