I built a small tool that I keep wishing existed whenever Iโm comparing LLMs
๐๐๐ ๐๐ผ๐บ๐ฝ๐ฎ๐ฟ๐ฎ๐๐ผ๐ฟ

๐ช๐ต๐ ๐ถ๐โ๐ ๐๐๐ฒ๐ณ๐๐น
Comparing models is harder than it should be. You end up running the same prompt across different tools, losing history, and struggling to make a clean side-by-side assessment of output quality, latency, and token usage.
This tool lets you run the same prompt set across multiple models and compare results in one place.
Importantly it lets you take a conversation from one LLM and continue it in another LLM. This is useful in cases where you want to use a powerful LLM for the initial prompt but donโt necessarily need it for subsequent refinements.
๐๐ฒ๐ ๐ณ๐ฒ๐ฎ๐๐๐ฟ๐ฒ๐
- Side-by-side comparisons across multiple LLMs/providers
- Tracks latency + token usage
- Save/load sessions as JSON
- Share sessions via GitHub Gists
๐ฃ๐ฟ๐ผ๐๐ถ๐ฑ๐ฒ๐ฟ๐ ๐๐๐ฝ๐ฝ๐ผ๐ฟ๐๐ฒ๐ฑ
- OpenAI
- Anthropic Claude
- Google Gemini
- OpenRouter
- Local Models (Ollama or any compatible endpoint)
๐ฃ๐ฟ๐ถ๐๐ฎ๐ฐ๐ ๐ฏ๐ ๐ฑ๐ฒ๐ณ๐ฎ๐๐น๐
Thereโs no backend storing your prompts or API keys. Everything is stored locally in your browser (using local storage), and calls are made directly from your browser to the model provider.
๐ฆ๐ต๐ฎ๐ฟ๐ถ๐ป๐ด ๐๐ฒ๐๐๐ถ๐ผ๐ป๐
Your most recent session is saved locally in your browser, and you can export it as a JSON file (and re-import it later to pick up where you left off).
If you want to share a comparison, you can also publish the session as a GitHub Gist (public or secret) using a GitHub token, then send a link that will load the exact same session for someone else.
Hereโs an example based on the Claude Career Coach prompt.
๐ข๐ฝ๐ฒ๐ป ๐ฆ๐ผ๐๐ฟ๐ฐ๐ฒ
This is all a single HTML page with Javascript embedded in it. Download it and deploy it where ever you want, or use it my version.
Source code: Github