Command Palette
Search for a command to run...
A Preview of XiYan-SQL: A Multi-Generator Ensemble Framework for Text-to-SQL

Abstract
To tackle the challenges of large language model performance in naturallanguage to SQL tasks, we introduce XiYan-SQL, an innovative framework thatemploys a multi-generator ensemble strategy to improve candidate generation. Weintroduce M-Schema, a semi-structured schema representation method designed toenhance the understanding of database structures. To enhance the quality anddiversity of generated candidate SQL queries, XiYan-SQL integrates thesignificant potential of in-context learning (ICL) with the precise control ofsupervised fine-tuning. On one hand, we propose a series of training strategiesto fine-tune models to generate high-quality candidates with diversepreferences. On the other hand, we implement the ICL approach with an exampleselection method based on named entity recognition to prevent overemphasis onentities. The refiner optimizes each candidate by correcting logical orsyntactical errors. To address the challenge of identifying the best candidate,we fine-tune a selection model to distinguish nuances of candidate SQL queries.The experimental results on multiple dialect datasets demonstrate therobustness of XiYan-SQL in addressing challenges across different scenarios.Overall, our proposed XiYan-SQL achieves the state-of-the-art executionaccuracy of 75.63% on Bird benchmark, 89.65% on the Spider test set, 69.86% onSQL-Eval, 41.20% on NL2GQL. The proposed framework not only enhances thequality and diversity of SQL queries but also outperforms previous methods.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| text-to-sql-on-bird-big-bench-for-large-scale | XiYan-SQL | Execution Accuracy % (Dev): 73.34 Execution Accuracy % (Test): 75.63 |
| text-to-sql-on-spider | XiYan-SQL | Execution Accuracy (Test): 89.65 |
| text-to-sql-on-sql-eval-1 | XiYan-SQL | Execution Accuracy: 69.86 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.