We have all been there. You build an agent. It works perfectly in the demo. You deploy it. And then, on a Tuesday at 3 PM, it decides that the URL for the API documentation is api.stripe.com/v1/users (a 404), but it looks so plausible that you waste 20 minutes debugging network errors.
Worse, it says this with 100% confidence.
When we try to fix this today, the industry tells us to use “LLM-as-a-Judge.” We are told to ask GPT-4o to grade GPT-3.5. We are told to fix the “vibes.”
But this creates a dangerous circular dependency. If the underlying models suffer from sycophancy (agreeing with the user) or hallucination, a Judge model often hallucinates a passing grade.
We are trying to fix probability with more probability. That is a losing game.
I believe we need to stop treating Agents like magic boxes and start treating them like software. Software has assertions. Software has unit tests. Software has return False.
We need to re-introduce Determinism into the stack.
Don’t ask an LLM if a URL is valid. It will hallucinate a 200 OK. Run requests.get().
Don’t ask an LLM if a SQL query is safe. It will miss subtle injections. Parse the AST.
Don’t ask an LLM if “Springfield” is ambiguous. It will guess Illinois. Check the database count.
If the code says “No,” it doesn’t matter how confident the LLM is. The action is blocked.
I got tired of debugging these errors by reading logs after the fact. I wanted a firewall that would catch these “Confident Idiot” moments in real-time.
So I built Steer.
It isn’t a heavy observability platform. It’s a simple Python library that wraps your agent functions and enforces hard guardrails.
python
# The “Steer” way: Hard Rules.
@capture(verifiers=[
# 1. Enforce SSN Format
RegexVerifier(pattern=r"^\d{3}-\d{2}-\d{4}$"),
# 2. Block Markdown
JsonVerifier(strict=True)
])
def update_user_profile(data):
# If the LLM messes up the format, this code never runs.
# The error is caught, logged, and sent to a dashboard for correction.
db.update(data)The most interesting part of this experiment isn’t just catching the error—it’s fixing it.
When Steer catches a failure (like an agent wrapping JSON in Markdown), it doesn’t just crash. It flags the incident in a local dashboard. I can then click “Teach” and inject a specific correction rule (e.g., “System Override: Never use markdown backticks”).
The next time the agent runs, that rule is injected into its context. It essentially allows me to “Patch” the model’s behavior without rewriting my prompt templates or redeploying code.
I released steer-sdk v0.2 this week. It is open source (Apache 2.0).
It is not a “Platform.” It is a library. It runs locally. It keeps your keys private.
If you are tired of debugging agents based on vibes, check it out. I’d love to know if this deterministic approach matches your experience in production.