— — Blog
Benchmarking Agentic LLMs on SQL Generation
I built a benchmark for evaluating how well agentic LLMs handle SQL generation tasks.
I built my own benchmark to find out. 25 text questions of various difficulty that a LLM needs to build a SQL query from, with an agentic debugging loop to allow it to correct its own mistakes.
Rather than duplicate the write-up here, see the full results and methodology at sql-benchmark.nicklothian.com.
Source code: github.com/nlothian/llm-sql-benchmark