NanoBits
Posts
Why AI breaks in Production and the Patterns that fix it: Part 1

Why AI breaks in Production and the Patterns that fix it: Part 1

After shipping multiple AI features that broke in strange ways, I finally found a framework that makes sense.

Geetika Mehta
November 30, 2025

EDITOR’S NOTE

Dear Nanobits readers,

Last week, I found myself in a packed room at the Google Developer Group meetup in Seattle. Two senior engineers from big tech were presenting on something I had been struggling with for months: why do AI features that work in demos fall apart in production?

I have shipped a few AI features this year. Two broke in ways I couldn't explain. One hallucinated confidently. Another became inexplicably slow. The third just drifted: same prompts, different behavior few weeks later.

What frustrated me most? No playbook. No structured guidance. Just vibes, copied prompts, and developers quietly dealing with chaos. We have been treating AI product development in a very unscientific way.

That presentation changed everything. For the first time, I saw a coherent framework treating AI product development as actual engineering. Design patterns for probabilistic systems. Reusable solutions that work.

So I planned to write a two-part series to cover everything I learnt. Today we are covering why traditional patterns fail and the behavior patterns that form the foundation. In Part 2, we will tackle retrieval, governance, and composition.

Let’s get started.

THE FOUNDATION HAS CHANGED

Traditional software assumes determinism. Give a function the same input, get the same output. Every time. That consistency makes systems testable, scalable, predictable. Our entire toolkit from unit tests, monitoring, failure analysis, all assume that once you define a rule, the system follows it.

AI broke that assumption.

Models are probabilistic. Behavior depends on context, phrasing, retrieval results, temperature, and variables we never considered in classical architecture. The variation breaks mental models we have relied on for decades.

A system that shifts behavior based on context means:

Tests aren't proofs of correctness
Edge cases are infinite
Failures have no clear boundaries

The goal isn't to fight probabilistic behavior, it's to wrap LLMs in enough structure that the overall system behaves predictably, even when the model doesn't.

Deterministic vs Probabilistic

Why Now?

78% of companies globally use AI daily. But most teams build without shared vocabulary or structure. That creates chaos.

And the cost of no patterns:

Prompt sprawl. Prompts copied around, modified, no one knows what's running.

Inconsistent outputs. Different answers to different users. Hallucinations appear randomly.

Escalating costs. Every request hits the expensive model. No intelligent routing.

Impossible to test. Can't version prompts, isolate errors, or monitor quality.

Stuck at prototypes. Demos break under real user inputs.

Developers wasting weeks trying to debug the impossible

The real cost? Systems fail in ways you cannot detect or recover from. Patterns turn AI into engineered products, not unpredictable demos.

WHAT ARE AI-NATIVE DESIGN PATTERNS?

Traditional design patterns have assumed structured functions and classes. AI-native patterns are different. They are model-centric, operating at the system level.

They address:

How LLMs behave with context
How outputs drift over time
How to manage uncertainty and variability

These are architectural patterns for probabilistic systems.

Their goal isn’t to “tame” the model. It’s to wrap the system with enough structure that the overall application behaves predictably, even when the model doesn’t.

Today we will cover Behavior Patterns. Retrieval, Governance, and Composition will be covered in our Part 2 next week.

BEHAVIOR PATTERNS: THE FOUNDATION

Behavior patterns shape how models think before adding external context. They give structure, reduce variability, and make systems predictable.

Without stable behavior at the prompt level, everything else including retrieval, orchestration, and safety become exponentially harder.

Pattern #1: Structured Prompting

Reusable templates with clear slots and consistent framing.

When models follow consistent formats like JSON schemas, fixed fields, predictable layouts, the behavior becomes far less chaotic. Every downstream component depends on structured output.

How it works:

Instead of "What is my store policy for XYZ?", divide prompts into sections:

Role: "You are a store policy assistant."
Task: "Answer customer questions about store policy."
Context: "Use the ABC Policy document."
Format: "Output as JSON: {category, urgency, next_action}"
Constraints: "Abstain when unsure."

Most models now support structured output natively. But even without API support, explicit instructions improve consistency dramatically.

Don't let models improvise structure. Define it upfront. Enforce it consistently.

Pattern #2: N-Shot Prompting

Provide 2-5 input/output examples to steer response structure.

Show, don't tell. Models learn by imitation. High-quality examples, especially covering edge cases, help anchor reasoning and formatting.

Example:

Input: "I can't believe this happened!"  
Output: {"mood": "surprised", "intensity": "high"}

Input: "It's fine, I guess."  
Output: {"mood": "neutral", "intensity": "low"}

Now classify: "Well, that was unexpected."

Use for complex classification, nuanced tone requirements, and edge case handling. Select examples that span the decision space.

Pattern #3: Context Framing

Clarify ambiguity. Frame role, rules, and history. Narrow to task-relevant knowledge.

LLMs respond to framing, not just tasks. Vague frames force guessing. Explicit frames create predictability.

Three components needed to ensure this:

Reduce ambiguity: Tell the model the audience, tone, constraints, business rules. Don't let it improvise.

Example: "Summarize this contract for non-technical executives. Focus on financial obligations and termination clauses. Plain language. Under 200 words."

Set identity and boundaries: Define the model's role and what it should avoid.

Example: "You are a compliance assistant. Never provide investment advice. Only reference approved regulatory documents."

Right context window: Add only what supports the task. Don't overload with irrelevant information.

Context framing improves reliability without model changes or fine-tuning. When framing is tight, reasoning focuses, outputs stabilize, failures become predictable.

Pattern #4: Prompt Versioning & Experimentation

Prompts are code. Version, test, and iterate them like code.

A prompt has behavior, side effects, and can break production. Treat changes accordingly.

Three components to keep in mind:

Explicit Version IDs: Track which version produced which output. Debug regressions. Compare performance. Roll back safely.
Controlled Experimentation: Small edits cause big behavioral shifts. Run A/B tests, shadow tests, canary deployments. Never just replace prompts.
Compare versions against real traffic or evaluation judges. Ask: "Is this measurably better?"
Lifecycle Management: Update prompts as retrieval, memory, or guardrails evolve. Iterate, revert, ship improvements with confidence.
Versioning creates feedback loops. You see what breaks, understand why, fix systematically, improve continuously. It's the difference between shipping AI and staying stuck in prototypes..

END NOTE

These four behavior patterns: structured prompting, N-shot prompting, context framing, and prompt versioning are the foundation for AI native design. These are no longer optional. Not nice-to-haves. This is where the difference is between demos and production.

So what can you do immediately?

Engineers: Pick one production prompt. Version it. Add structure. Test variations. Measure the difference.

PMs: AI reliability comes from better patterns, not better models. Ask: "How are we versioning prompts? What happens when something breaks?"

Leaders: Invest in prompt infrastructure now. Versioning, evaluation, observability. These are the foundation of shipping AI at scale.

But behavior patterns alone can't solve everything.

They make models predictable but can't solve hallucination when models don't know answers. Can't handle dynamic knowledge. Can't keep systems safe from adversarial inputs.

So in Part 2, we will cover:

Retrieval Patterns: RAG, memory, and freshness that ground models in real information

Governance Patterns: Guardrails, tracing, and evaluation loops that keep systems safe and observable

Composition: How these patterns work together in production, and what happens when a single query touches eight patterns behind the scenes

The gap between demos and production isn't model quality. It's product and engineering discipline. These patterns close that gap.

See you in Part 2.

If you liked our newsletter, share this link with your friends and request them to subscribe too.

Check out our website to get the latest updates in AI

Reply

or to participate.