HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

A Preview of XiYan-SQL: A Multi-Generator Ensemble Framework for Text-to-SQL

A Preview of XiYan-SQL: A Multi-Generator Ensemble Framework for
  Text-to-SQL

Abstract

To tackle the challenges of large language model performance in naturallanguage to SQL tasks, we introduce XiYan-SQL, an innovative framework thatemploys a multi-generator ensemble strategy to improve candidate generation. Weintroduce M-Schema, a semi-structured schema representation method designed toenhance the understanding of database structures. To enhance the quality anddiversity of generated candidate SQL queries, XiYan-SQL integrates thesignificant potential of in-context learning (ICL) with the precise control ofsupervised fine-tuning. On one hand, we propose a series of training strategiesto fine-tune models to generate high-quality candidates with diversepreferences. On the other hand, we implement the ICL approach with an exampleselection method based on named entity recognition to prevent overemphasis onentities. The refiner optimizes each candidate by correcting logical orsyntactical errors. To address the challenge of identifying the best candidate,we fine-tune a selection model to distinguish nuances of candidate SQL queries.The experimental results on multiple dialect datasets demonstrate therobustness of XiYan-SQL in addressing challenges across different scenarios.Overall, our proposed XiYan-SQL achieves the state-of-the-art executionaccuracy of 75.63% on Bird benchmark, 89.65% on the Spider test set, 69.86% onSQL-Eval, 41.20% on NL2GQL. The proposed framework not only enhances thequality and diversity of SQL queries but also outperforms previous methods.

Code Repositories

Benchmarks

BenchmarkMethodologyMetrics
text-to-sql-on-bird-big-bench-for-large-scaleXiYan-SQL
Execution Accuracy % (Dev): 73.34
Execution Accuracy % (Test): 75.63
text-to-sql-on-spiderXiYan-SQL
Execution Accuracy (Test): 89.65
text-to-sql-on-sql-eval-1XiYan-SQL
Execution Accuracy: 69.86

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
A Preview of XiYan-SQL: A Multi-Generator Ensemble Framework for Text-to-SQL | Papers | HyperAI