News Release

A cost-effective text-to-SQL approach based on adaptive refinement

Peer-Reviewed Publication

Higher Education Press

image

image: 

The overview of SEA-SQL

view more 

Credit: HIGHER EDUCATION PRESS

The purpose of the Text-to-SQL task is to bridge the gap between natural language and SQL queries. Current approaches mainly rely on large language models (LLMs), but employing them for Text-to-SQL has three major limitations:

  • Inherent Model Bias: Models may show biases due to patterns in their training data when generating SQL queries.
  • Unexecutable SQL: Although LLMs can generate SQL, they cannot verify if the final SQL query is executable.
  • Expensive Inference Cost: Using models like GPT-4 for SQL generation is costly.

To address these issues, a research team led by Yingxia Shao published a new study on 15 March 2026 in Frontiers of Computer Science co-published by Higher Education Press and Springer Nature.

The team proposed an innovative framework called Semantic-Enhanced Text-to-SQL with Adaptive Refinement (SEA-SQL), which uses zero-shot prompts and GPT-3.5 to perform the Text-to-SQL task more effectively and economically. This method has been tested on multiple datasets, showing reduced costs and improved performance compared to previous GPT-3.5 based and GPT-4 based approaches.

In this research, they enhance the schema with semantic information to enrich database content and improve SQL queries through adaptive bias elimination and dynamic execution adjustment.

Specifically, the semantic-enhanced schema includes column values related to the question, which facilitates the LLM's identification and utilization of the appropriate tables and columns during SQL query generation. Subsequently, an adaptive bias eliminator, which is a fine-tuned small LLM (e.g., Mistral-7B), is used to eliminate the inherent biases of the LLM within the SQL queries, thereby enhancing the quality of the generated SQL queries. Finally, with dynamic execution adjustment, an iterative process of SQL execution and LLM reflection-correction is employed until the SQL query is executed successfully.

The results demonstrate that SEA-SQL achieves state-of-the-art performance in the GPT3.5 scenario with 9%-58% of the generation cost. Furthermore, SEA-SQL is comparable to GPT-4 with only 0.9%-5.3% of the generation cost.


Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.