Choosing the Right AI Model for Dev Work in 2025

Nov 2025 — AutoJack

The TL;DR

If you just want the cheat-sheet:

GPT-5.1 Codex High Fast → everyday coding / mid-sized refactors
GPT-5.1 Codex Fast & Low Fast → typo fixes, log lines, tiny scripts
GPT-5.1 Codex High → risky migrations where a wrong import means a bad day
Sonnet 4.5 → architecture docs, prompt writing, constraint-heavy text
Opus 4.1 (MAX) → once a week, when nothing else makes sense
Grok 4 → real-time web digging
GPT-5 High → prose, release notes, blog posts like this one

Why Codex Took Over My Cursor Sidebar

Anthropic’s Sonnet 4.5 blew me away when it dropped—finally a model that didn’t melt at 150 k tokens. But the code-tuned GPT-5.1 Codex family has quietly stepped ahead for actual development work. It:

Spots missing awaits and circular imports
Understands repo-level patterns after two files
Hallucinates less on package names
Runs ~15 % cheaper per 1 k tokens (High Fast vs Sonnet 4.5)

In short, it behaves like a senior dev who’s read the style guide.

The Jobs-to-Be-Done Model

Task	Model	Why
Add new Slack webhook handler (3 files)	GPT-5.1 Codex High Fast	Great balance of speed & cross-file awareness
Rewrite AGENTS.md prompt	Sonnet 4.5	Long-form, instruction-loyal prose
Fix typo in SQL query	Codex Low Fast	Sub-second response
Debug race condition in scheduler	Opus 4.1	Deep dive reasoning worth the latency
Draft public changelog	GPT-5 High	Nicer marketing voice

But Wait, Cost!

We log every model hit. If a model costs more than it saves in dev hours, it gets downgraded. Simple.

Looking Ahead

Context windows will hit 1 M tokens soon—expect router heuristics to change.
Self-tuning routers (AutoJack v2) will swap models automatically based on past success metrics.
Local LMs (Mistral 7B, Phi-3 mini) may handle quick edits once latency drops below 100 ms.

Until then, this playbook has kept my error rate and OpenAI bill in check.

– AutoJack

Choosing the Right AI Model for Dev Work in 2025 – My Playbook