The Agentic AI Playbook: When to trust the model—and when to trust the code
Agentic AI is genuinely useful. It can turn “I want this, not that” into structured intent, plan multi-step tool workflows, and handle the long tail of weird edge cases that would otherwise become a swamp of brittle rules.
But the best products don’t call a model for everything.
They draw a clear line:
Use agents for ambiguity. Use deterministic logic for guarantees.
That line is how you keep systems fast, affordable, and reliable.
Where agents earn their keep
Agents are a great fit when the problem is inherently fuzzy:
- Interpreting human input (turning natural language into structured fields)
- Planning across tools (search → compare → decide → explain)
- Handling the long tail (rare, messy scenarios you don’t want to hardcode forever)
In these cases, the “agent” isn’t replacing your system — it’s filling the gaps where strict rules fall apart.
Where deterministic logic wins (and saves money)
There’s a lot your product can do without asking a model anything:
- Validation and parsing (schemas, formatting, normalization)
- Permissions and policy (what’s allowed, what’s not)
- Risk checks (fraud signals, rate limits, thresholds)
- Known lookups (databases, caches, catalog data)
- High-volume happy paths (predictable flows that run constantly)
This matters for cost and latency. LLM calls add up fast, and they add variability. If something can be determined with code, do it with code — and save the model calls for when you actually need judgment.
A simple rule: don’t spend tokens on things you can compute.
The pattern that keeps agents reliable
What works in practice is boring (in a good way):
- Deterministic pre-checks
Auth, constraints, required fields, policy gates. - Agent proposes (only when needed)
Return structured output (ideally JSON). Keep the task narrow. - Deterministic verification
Schema validation + business rules + consistency checks. - Safe execution + fallbacks
Idempotency, retries, and a graceful degrade path (clarifying question, simpler flow, or escalation).
This gives you the upside of agentic behavior without letting the model become the source of truth.
Performance is a product feature
A lot of “agentic” demos feel great… until you try to ship them.
Real products have to balance:
- speed (users won’t wait)
- cost (calls scale with usage)
- reliability (tool failures, retries, edge cases)
- safety (don’t do irreversible things casually)
The trick is configuring your system so the agent only gets invoked when it adds real value. Everything else should be deterministic and cached wherever possible.
Prompting isn’t a one-off — it’s an iteration loop
Good prompt engineering looks less like “write a clever prompt” and more like “tighten a system over time”:
- keep prompts short and strict
- demand structured outputs
- maintain versioned prompts like code
- build feedback loops from logs
Logs are where you learn what users actually ask, where the agent drifts, what fails verification, and what the correct outcome should’ve been. That data is how you improve prompts, add examples, create evals, and reduce unnecessary model calls over time.
Big model vs smaller model: how teams usually scale it
A common evolution is:
- prototype with a strong general model,
- then shift repeated tasks to a cheaper setup: a smaller distilled/fine-tuned model, or a router that chooses “cheap model vs expensive model” based on complexity,
- with a fallback to the bigger model when confidence is low.
That combo (router + small model + fallback) is how you keep quality while making performance and cost predictable.
My rule of thumb
If it needs interpretation or planning: agent.
If it needs correctness, speed, or enforcement: deterministic.
Agents are a powerful layer — especially when you treat them like a component, not the entire system. The best agentic products feel “smart,” but under the hood they’re grounded in constraints, verification, and smart decisions about when a model call is actually worth it.